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@ Self-incompatibility gene. 

(g) DNA sequences of S-genes which encode S-prote!ns and 
control the self-incompatibility reaction in gametophytic self-in- 
compatible plants have been identified. The DNA sequence 
encoding several S-protelns of N. alata and their attendant 
signal sequences are specifically provided. Regulatory sequen- 
ces which direct expression of the S-genes In reproduction 
tissue of self-incompatible plants have also been Identified. A 
method for the identification and isolation of cDNA and genomic 
Dl^ coding sequences of the S-genes is described. 
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Description 

SELF-INCOMPATtBIUTY GENE 

This Is a continuation-in-part of U.S. Patent Application Serial No. 854,139, filed April 21 , 1986, which In turn 
Is a continuation-in-part of U.S. Patent Application Serial No. 792,435, filed October 29, 1985, now abandoned. 

5 

Field of the Inventron 

This invention relates to the Identification and Isolation of cDNA and genomic DNA coding sequences of an 
S-gene which controls self-incompatibility In a wide variety of self-Incompatible plants, particularly exemplified 
by members of the Solanaceae . Studies of S-gene products, S-proteins. indicate that they are associated with 
10 the expression of the self-incompatibility genotype of such self-incompatible plants. 

S-proteins are useful in controi of pollen tube growth, for example as natural gametocides to control, Induce 
or promote self-incompatibility and Interspecific incompatibility. S-genes and their products can also be used 
in genetic manipulation of plants to create self-Incompatible cultlvars. Plants engineered In this way will be 
valuable for the economic production of hybrid seed. 

IS 

Background of the Invention 

Many plant species, including Nicotlana alata and Lycopersicon peruvianum . are self-incompatible, that is 
they cannot be fertilized by pollen from themselves or by that of a plant of the same S- (or self-incompatibility) 
genotype. The molecular basis of self-4ncompatibility Is believed to arise from the presence of S-protein in the 
20 mature styles of plants; in particular, as exemplified by N. alata and L. peruvianum , S-protein has now been 
shown to be present in extracts of plant styles at the developmental stages of buds at first show of petal color, 
and at the subsequent stages of maturation of open but immature flowers, and flowers having mature 
glistening styles. On the other hand, S-protein Is not present in the earlier developmental stages of green bud 
and elongated bud. 

25 For general reviews of self-incompatibility, see de Nettancourt (1977) Incompatibility In Angiosperms , 
Springer-Verlag, Berlin; Hesiop-Harrison (1978) Proc. Roy. Soc. London B, 2G^:73; Lewis (1979) N.Z. J. Bot. 
17:637; Pandey (1979) N.Z. J. Bot 17:645 and Mulcahy (1983) Science 220:1247. Self-incompatlblllty is defined 
as the inability of female hermaphrodite seed plants to produce zygotes after self-pollination. TWo types of 
self-incompatibility, gametophytic and sporophytic. are recognized. Gametophytic Incompatibility is most 

^ common and in many cases is controlled by a single nuclear gene locus (S-locus) with multiple alleles. Pollen 
expresses its haploid S-genotype and matings are Incompatible if the &-allele expressed is the same as either 
of the S-alleles expressed in the diploid tissue of the pistil. During both Incompatible and compatible matings, 
pollen tubes germinate and grow through the stigma into the transmitting tissue of the style. Tube growth from 
Incompatible pollen grains is arrested In the upper third of the style. 

35 In sporophytic incompatibiiity, pollen behavior is determined by the genotype of the pollen-producing plant. 
If either of the two S-alletes in the pollen parent is also present in the style, pollen tube growth is inhibited. 
Unlil^e the gametophytic systems, Inhibition usually occurs at the stigma surface and not In the style. In 
sporophytic Incompatibility, S-proteIn may be concentrated at or near the stigma surface. The gametophytic 
polyallelic system Is considered to be the ancestral form of self incompatibility in flowering plants wnth the 

40 sporophytic system being derived from it (de Nettancourt 1977, supra) . The products of the S-gene In the two 
systems are considered to be structurally related. 

There are five species of gametophytically self- incompatible plants and two species of sporophytically 
Incompatible plants in which style or stigma proteins apparently related to S-genotype have been detected by 
either electrophoretic or immunological methods. In N. alata , an association between specific protein beuids 

45 and three S-allele groups was demonstrated by isoelectric focussing of stylar extracts (Bredemeijer and Blaas 
(1981) Theor, Appl. Genet. 59:185). Two major antigenic components have been Identified in mature styles of a 
Prunus avium cultivar of S3S4 genotype, one of which (S- antigen) was specific to the particular S-ailele group 
(Raff, eta}. (1981) Planta 153:125; and ?^au, etal. (1982) Planta 156:505). The S-antlgen. a glycoprotein, was a 
potent inhibitor of the in vitro growth of pollen tubes from a S3S4 cultivar (Williams etal. (1982) Planta 156:577). 

50 The glycoprotein was resolved into two components, purportedly representing the S3 and S4 products of the 
S3S4 genotype. Stylar protein components which have been associated with the S-aliele group or the 
self-lncompatlbillty genotype are reported in Petunia hybrid a (Unskens (1960) Z. Bot. 48:126), Lilium 
longiflorum and Trifolium pratens (Hesiop-Harrison (1982) Ann. Bot. 49:729). 
A glycoprotein corresponding to genotype S7 of Brassica campestrls has been Isolated from extracts of 

65 stigmas by gel-filtration followed by affinity chromatography and Isoelectric focussing (Nishio and Hinata 
(1979) Jap. J. Genet. 54:307). Similar techniques were used to Isolate S-specific glycoproteins from stigma 
extracts of Brassica oleracea plants homozygous for S-alleles S39, S22 and §7 (Nishio and Hinata (1982) 
Genetics 100:641). Antisera raised to each isolated S-specific Brassica oleracea glycoprotein not only 
precipitated its homologous glycoprotein but also reacted with the other two S-specifIc glycoproteins of B. 

60 oleracea and the Sz-speclflc glycoprotein of B. campestris (Hinata et al . (1982) Genefics 100:649), An 
S-specific glycoprotein was isolated by Ferrari et al . (1981) Plant Physiol. 67:270 from a stigma extract of B. 
oleracea using sucrose gradient sedimentation and double diffusion tests in gels in which the proteins were 
Identified by Coomassle Blue staining. This preparation was shown to be biologically active since pretreatment 
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of S2S2 pollen with the glycoprotein prevented the pollen from germinating on normally compatible stigmas. 
Recently a cDNA done encoding part of an S-Iocus specific glycoprotein from B. oleracea stigmas has been 
described (Nasrallah etal. (1985) Nature 318:263-267. 

In work that Is detailidln Clarke etal.. U.S. Patent Applications Serial No. 615.079. filed May 24. 1984. and 
Serial No. 050,747, filed May 15, 1987, stylar extracts of several self^ncompatlbility genotypes from both 
Nicotiana alata and Lycopersicon peruvianum were examined for the presence of S-gene associated protein. 
Glycoprotein materials were Identified In the 30,000 MW region of stylar extracts of genotypes S1S3. S2S3. 
S2S2 and S3S3 of N. alata and of genotypes S1S2. S2S3, S1S3. S2S2. S3S3 and S3S4 of L. peruvianum. By 
cbmparinglwo-dlmensional gel electrophoresis of stylar extracts of the different genotypes, closely related, 
but distinct glycoproteins were found to segregate with the Individual S-alleles. For example, the N. aiata 
S2-protein was found only in stylar extracts of the genotypes containing the S2-alleles (S2S3 and S2S2). For 
each genotype, the genotype specific glycoprotein only appeared as the flower matured, and was detected 
only in stylar extracts of buds at first show of petal color and In later stages of maturation, but not In eariier bud 
stages. Therefore, the appearance of these glycoproteins is temporally coincident with the appearance of the 
self-Incompatibility phenotype. The S2-glycoproteln of N. aiata and the §2 and Ss-proteins of L. peruvianum IS 
were shown to be more highly concentrated in the upper style sections, which is the zone in which pollen tube 
inhibition occurs. Therefore, the appearance of these glycoproteins Is spatially coincident with the 
self-incompatibility reaction. Further, con-oboration of the biological activity of S2-protein of N. alata was 
demonstrated by its inhibition of pollen tube growth in an in vitro assay (Wllllams. et ai .. 1982, supra) . 

A significant aspect of the work disclosed In U.S. Application Serial Nos. 615,079 and 050.747 was the 20 
discovery that rabbit antisera and monoclonal antibodies raised to individual ^-proteins or stylar extracts 
showed immunological cross-reaction between S-protelns of different genotype within the same species, 
between S-proteins of different species and also between species having gametophytic incompatibility and 
sporophytic incompatibility. It was concluded therein that there is structural homology among S-protelns. and 
that despite apparent differences In molecular weight and pi, these proteins are a recognizable stmctural class 25 
in addition to their functional similarities. 

These applications also reported the results of N-termlnai sequencing of several mature N. alata (S2. Se. Sz 
and Sfii) proteins and U peruvianum (§1 and §3) proteins. Significant amino acid sequence homologies among 
these gametophytic S-proteins were found, in the region sequenced (amino acids 1-15). the N. aiata S2 protein 
is 800/0 homologous to the N. aiata Se protein. 67<Vb homologous to the L. peruvianum S^ protein. 53o^ 30 
homologous to the L Peruvianum §3 protein. ..... x o 

U S Application Serial Nos. 615.079 and 050.747 also disclosed a method of purification for S-protelns 
which included fractionation of stylar extracts by ion exchange chromatography followed by a second 
fractionation by affinity chromatography. The method of purification was exemplified with the isolation of the 
32K S2-glycoprotein from Nicotiana alata styles. 

Recent reports of the isolation and amino acid sequence of the Se. S9 and S12 proteins of Brassica 
camoestris show that there is extensive homology among these gametophytic S-proteins (Takayama et_al. 
(1986) Agri c Biol. Chem. 50:136501367: Takayama e^al. 1986) ibid. p. 1673-1676; Takayama e^al. (1987) 
Nature 326 102-105. The predicted amino acid sequence of the Se protein of B. oleracea (Takayama e^.. 
1987 supra) based on the DNA sequence of anSe gene cDNA clone (Nasrallah etal.. 1985, supra ) is found to 40 
be abouTTSO/o homologous to the B. campestris S-proteins. Comparison of the N. alata and L. peruvianum 
S-protein sequences (U.S. Patent Applications 615.079 and 050.747: Anderson etal. (1986) Nature 321 :38-44) 
with those of the Brassica S-proteins indicate that there is no significant homology between the gametophytic 
and sporophytic S-proteins. 

The S-protelnsThat have been identified are glycoproteins, which are proteins that have been modified by 45 
covalent bonding of one or more carbohydrate groups. Uttle is known of the composition and structure of the 
carbohydrate portion of S-proteins. It is. as yet, unclear what contribution, if any, the carbohydrate portion of 
the S-protein makes to"bioiogical activity in the incompatibility reaction. Petunia hybrida stylar mRNA is 
translated in Xenopus laevis (frog) egg cells to produce active proteins which induce the incompatibility 
reaction. The relative glycosylation of S-proteins produced in frog egg cells to that of the S-proteins produced 50 
In the plant is unknown; however, the post-translational processing in the foreign system Is adequate to 
produce biologically active proteins (Donk. van der J. A. W. M.. (1975) Nature 256:674-675). 

Most proteins, such as the S-proteins, that are excreted from or transported within cells have signal or 
transit sequences that function in the translocation of the protein, for example see : Periman, D. and Halverson, 
H W (1983) J Mol Biol. 167:391-409; Edens, L. etal. (1984) Cell 37:629-633.; and Messing. J. etal. in Genetic 55 
En gineering of Plants , edTlCbsuge. T. etal. (1983) Plenum Press. New Yori<, pp. 21 1-227. Signal or transit DNA 
sequences are generally adjacent to the 5' end of the DNA encoding the mature protein, are co-transcnbed 
with the mature protein DNA sequence Into mRNA and are co-translated to give immature proteins with the 
signal or transit peptide attached. During the translocation process the signal or transit peptide is cleaved to 
produce the mature protein. ^ * ^ 

The expression of S-genes In self-incompatible plants shows very complex regulation, with S-gene products 
appearing in only cerFain tissues at certain times. The mechanism of this regulation is not yet known in detail, 
but involves the presence of specific regulatory DNA sequences In close proximity to the genomic DNA that 
encodes the S-proteln. Adjacent to the structural gene and signal or transit sequences, are promoter 
sequences that control the initiation of transcription and exert control over protein expression levels. 65 
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Summary of the Invention 

It is a goal of the present invention to Isolate and characterize the S-genes of gametophytic 
self-incompatible plants. Toward this goal, methods for isolating cDNA clones of S-genes have been 
5 described and have been exemplified by their application to the Isolation of near full-length and full-length 
cDNA clones of the S-genes of plants of the genus Nicotlana . specifically to the Isolation of cDNA clones of the 
§2. §3 and Se genes of N. alata . The methods described are generally applicable to the isolation of cDNA 
clones of gametophytic self-Incompatible plants, including plants which are members of the Solanaceae which 
includes among others the genera Nicotlana and Lycopersicon . 

10 The S-gene cDNA clones of the present invention are useful as probes for the Identification of genomic 
S-gene sequences which Include regulatory sequences which direct expression of the S-gene products In 
plant reproductive tissue including female secretory tissues and pollen. Such methods have been exemplified 
by their application to the isolation of the genomic sequences of the S-z gene of N. alata . Such method are 
generally applicable to the isolation of genomic sequences of S-genes of gametophytic self-incompatible 

15 plants. Full-length S-gene cDNA clones which can be isolated byThe methods described herein contain DNA 
sequence which encode the S-gene protein Including Its complete signal or transit sequence. This signal 
sequence functions In the extra cellular translocation of the mature S-protein from the transmitting tract ceils. 
The transmitting tract is the tissue through which the pollen tubes grow on their way to the ovary. 
The S-protein DNA coding sequences can be employed, for example, In heterologous in vivo expression 

20 systems to direct synthesis of S-proteIn which can thereby be produced In significant amounts in biologically 
active form to be used, for example, as natural gametocides. The DNA sequence encoding the mature 
S-proteIn can be so employed separately or In combination with Its attendant signal and/or regulatory 
sequences. 

Signal or transit sequences are useful in combination with adjacent DNA sequences of the mature protein in 
25 affecting the excretion or translocation of mature protein In heterologous expression systems. Signal or transit 
sequence may also enhance protein expression levels. Signal or transit sequences are useful in the 
construction of chlmaerlc genes In which they are fused to a heterologous protein coding sequence, for 
example In a recombinant vector, to direct translocation of that protein. Plant signal or transit sequences are 
particularly Important for use In combination with their DNA sequences or in chimaeric gene fusions with 
30 heterologous coding sequences to target mature protein to specific organelles in plant cells or for excretion 
from cells. 

Near full-length cDNA clones can be employed to Isolate fuiWength cDNA clones containing complete 
coding and signal sequences. 
S-gene regulatory sequences isolated as described herein are useful in combination with DNA sequences 

35 encoding protein (i.e.. structural genes) In effecting transcription of the DNA coding sequences and exerting 
control over protein expression levels in heterologous expression systems. In particular, S-gene regulatory 
sequences are useful for the expression of heterologous protein in reproductive tissue of plants. For example, 
the S-gene regulatory sequences can be employed in the expression of toxic proteins in plant reproductive 
tissue, particularly in pollen tissue. The specifically expressed toxin would function as a natural gametocide. 

40 The present invention provides novel genetic constructs (recombinant DNA molecules and vectors) 
containing DNA sequence encoding S-proteins of gametophytic self-incompatible plants. Constructs 
containing S-gene signal sequences of S-gene regulatory sequences alone or in combination with S-gene 
coding sequences or heterologous coding sequences are also described. "~ 
S-gene regulatory sequences, as exemplified by the §2 gene of Nicotlana alata have been found to contain 

45 regions highly homologous to mitochondrial DNA. The high conservation of these regions and their positioning 
in the 5'-f lanking region of the S-gene indicate that they function in the tissue specific regulation of the S-gene. 

In a particular aspect of the present invention, a novel method for the Identification and isolation of S-gene 
cDNA of a gametophytic self-incompatible plant has been provided. This method Involves the steps of 
preparing a cDNA library from an appropriate S-genotype of the self-incompatible plant (I.e.. of an S-genotype 

50 which expresses the S-gene to be Isolated) and subjecting the cDNA library to differential hybridization 
screening. The cDNA library Is screened with a first cDNA probe prepared from mature style RNA of plants of 
an S-genotype which expresses the S-gene to be cloned and a second cDNA probe prepared from mature 
style RNA of plants of an S-genotype which is different from the S-genotype used to prepare the cDNA library 
and which does not express the S-gene to be cloned. Clones which hybridize more strongly to the first probe 

55 than to the second probe are selected. The selected clones are then employed as probes in northern blot 
hybridizations of style RNA from several S-genotypes. Clones that hybridize more strongly to RNA 
preparations from S-genotypes which express the target S-gene than to RNA preparations from S-genotypes 
which do not express the target S-gene are selected as cDNA clones of the target S-gene. Anysuch cDNA 
clones which are not full-length clones can be employed in conventional hybridization screening of the cDNA 

60 library to isolate full-length clones. 

It is preferred in this method that the cDNA library and the cDNA probes employed In differential screening 
be prepared from mature style RNA of homozygous S-genotypes. In such a case, the first cDNA probe is 
prepared from styles of the same homozygous S- genotype as the cDNA library, and the second cDNA probe 
IS prepared from styles of a different homozygous S-genotype. It will be readily apparent that heterozygous 

65 S-genotypes can also be employed In this method. If probes from heterozygous S-genotypes are employed to 
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screen a homozygous S-genotype cDNA library, then the §-genotype of the first probe must express the 
target S-gene and the S-genotype of the second probe must not express the target S-gene. 

If a heterozygous S-genotype is employed to prepare the cDNA library and homozygous S-genotypes are 
employed to prepare probes, then the S-genotype of the second cDNA probe must not express either of the 
S-genes expressed by the styles employed to prepare the cDNA library. Further, if heterozygous S-genotype 5 
cDNA probes are employed to screen a heterozygous S-genotype library, the S-genotype of the first probe 
must express the target S-gene while the S-genotype of the second, probe must not express the target 
S-gene. and in addition, either both of the S-genotypes used to prepare probes must express the non-target 
S-gene of the cDNA library S-genotype. or neither of the cDNA probe S-genotypes must express the 
non-target S-gene of the cDNA library S-genotype. 10. 

Brief Description of the Figures 

Figure 1 illustrates the separation of stylar extracts of N. alata genotypes 5282. S2S3, and S3S3 by 
selected 2-dimensional get electrophoresis. The protein bands associated with the two alleles are 15 
Identified. 

Figure 2 provides a comparison of (A) the chemically deglycosylated mature §2 glycoprotein of N. alata 
of molecular weight 26 l<d. with the (B) in vitro translation products of style poly(A-l-) RNA, by SDS-gel 
electrophoresis. Note the presence of the 27 kd molecular weight protein band only In the translation 
products from mature style poly(A+) RNA. The 27 kd molecular weight translation product is slightly 20 
larger than the chemically deglycosylated mature §2 protein, consistent with the presence of a signal 
sequence in the 27 kd protein. 

Figure 3 presents a comparison of the SDS- polyacrylamide gel electrophoresis of protein. extracts 
from ovary, style and other N. alata (S2S3) tissue. There is more similarity between the extracts of ovary 
and style than between extracts of other organs and style, as shown by the protein bands visualized by 25 
Coomassie Blue staining. 

Figure 4 shows the production of a 100 bp cDNA fragment from mature style poly(A + ) RNA using 
synthetic oligonucleotide 14-mers as primers. One batch primed synthesis of a single 100 bp fragment 
(tracks 1, 2 and 3). Tracks 4, 5, and 6 show that only the 100 bp fragment Is produced with mature style 
poly (A+ ) RNA when pooled synthetic primers are used. Only traces of the 100 bp fragment are detected 30 
from ovary and green bud style poly(A+ ) RNA. 

Figure 5 is a Northern blot analysis of mature style poiy(A+ ) RNA from N. alata genotypes S3S3, S1S3, 
S2S2 and S2S3. L. peruvianum genotypes S1S3 and mixed genotypes from B. oleracea . Poly(A-H) RNA 
from N. alata S2S3 green bud style and ovary are also included. All tracks are probed with 32p-iabelled 
probe from the NA-2-1 clone cDNA insert encoding the N. alata Sa-protein described Infra . 35 

Figure 6 contains autoradiograms of Southern hybridization blots of N. alata (N.a.) and L. esculentum 
(Le.) total and mitochondrial DNA (mtDNA) digested with Hindi II In which the hybridization probe was (A) 
the 1 .0 kb genomic §2 gene fragment or (8) the 750 bp mitochondrial clone from N. alata . Samples of total 
DNA contain 5 ^g and the mtDNA samples contain approximately 200 ng. Lane 5 of panel A contains an 
undigested sample of L. esculentum mtDNA. Molecular weight references in kllobase pairs are indicated. 40 

Figure 7 contains autoradiograms of Southern hybridization blots of total DNA probed with the 750 bp 
mitochondrial clone. Panel A is a long exposure autoradiogram of a blot containing total DNA of N. alata 
(N.a.). L. esculentum (L.e.) and L. penneliii (Lp.). A total of 5 ^g of DNA digested with HIndlll was 
employed in each lane. Variation in the signal of the strongly hybridizing 750 bp band in this blot Is due to 
different amounts of mtDNA contamination In the total DNA samples. Molecular weight markers are 45 
indicated. Panel B is a blot containing total DNA (5 ^g samples, digested with EcoRI) from six F2 progeny 
from a cross between L. esculentum and L. penneliii . Arrows indicate segregating fragments. 

Detailed Description of the Invention 
The following definitions apply in the specification and claims: ^ 
The S-gene protein is the product of the S-gene or S-ailele. The term protein as used herein also includes 
glycoprotein. Although the biochemical mechanism of the self-Incompatibility reaction is not fully understood, 
the S-protein is associated with the presence of self- incompatibility. Accordingly, the S-protein must (1) show 
segregation with the S-allele; (2) be localized in the tissue where the incompatibility reaction Is localized and 
(3) occur in the appropriate plant tissue In coincidence with the expression of self-incompatibility. In addition, it 55 
will be understood that the biological activity of the S-proteIn in an In vitro assay will provide corroboration that 
the S-proteln is itself functionally active for pollen inhibition. However, it is possible that the active component 
is a modified protein or a secondary product. In such cases, biological activity of the S-protein may require the 
activity of other components In order to be manifested in a bio-assay system. A mature S-protein is the 
processed form of the S-protein from which the signal or transit peptide has been cleaved. This is the form of 60 
the protein isolated from stylar tissue. 

The S-gene or S-allele contains the DNA coding sequences for the mature S-protelns defined above. 
Further, the S-gene contains the coding region for a signal or transit peptide and other Information necessary 
to the translation and processing of the S-protein. Further, the S-gene contains regulatory and promoter 
sequences involved In the transcription and expression and processing of the S-protein. Plant genomic 65 
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sequences may contain Introns. A full length cDNA clone comprises the DNA sequence encoding a mature 
protein and the entire signal or transit sequence. 

A self-Incompatible plant may have a heterozygous S- genotype In which two different S-alleles are 
expressed (i.e.. S1S3) or have a homozygous S-genotype In which the two alleles are the same (i.e., S1S1). 
5 The term regulatory sequence is used herein to refer to the DNA sequences associated with an S-gene 
which functions to regulate tissue specific expression of S protein (the S-gene product) In plant reproductive 
tissue. Plant reproductive tissue includes female secretory tissue (the stigma, style transmitting tissue and the 
epidermis of the placenta) and pollen. Sequences which function for regulation of expression of structural 
genes are most often present in the 6'-flanking region of the gene extending up to about 1 to 2 kb upstream 

10 from the transcription start site. The 5'-regulatory sequence includes a region which Is termed the promoter 
which functions specifically for the initiation of transcription. Promoter sequences are necessary, but not 
always sufficient, to drive the expression of a downstream gene. Eukaryotic promoters generally contain a 
sequence wmh homology to the consensus 5'-TATAAT-3' ("TATA" box) about 10-35 bp 5' to the transcription 
start site. About 30-70 bp 5' to the "TATA" box there Is often another promoter component with homology to 

75 the canonical form 5'-CCAAT-3', which In plants Is sometimes replaced by a "AGGA" box which is a region 
having adenine residues symmetrically flanking the base triplet "G(or T)NQ". Sequence elements associated 
with modulation of expression, including expression in response to stimuli, such as anaeroblosls and light and 
tissue specific expression are often found further upstream of the promoter region but can be found 
interspersed with the promoter elements. The sequences which function to modulate when and where a gene 

2Q Is expressed can comprise one or more sequence elements separated by non-functional sequence. In such 
cases, the distance separating the functional sequence elements can also be important for correct regulation. 
Certain sequence element can function as on/off switches, for example inducing expression in certain tissue 
and little or no expression in other tissue. Such sequence elements can function in concert with other 
sequence elements which modulate the level of expression. 

25 Placing a structural gene under the regulatory control of a promoter or a regulatory sequence means 
positioning the structural gene such that the expression of the gene is controlled by these sequences. 
Promoters and regulatory sequence elements are generally positioned upstream of the genes that they 
control. In the construction of a chimaeric gene in which a heterologous structural gene Is placed under the 
control of a regulatory sequence, it is generally preferred to position the regulatory sequence at a distance 

30 from the gene transcription start site that is approximately the same as the distance between that sequence 
and the homologous gene that it controls In its natural setting, i.e., the gene from which the regulatory 
sequence is derived. As is known in the art, some variation in this distance can be accommodated without loss 
of regulatory control and. in fact, certain variations can lead to improved control or higher expression levels. 
A structural gene is that portion of a gene comprising a DNA segment encoding a protein, polypeptide or a 

.35 portion thereof. Structural genes may include signal or transit sequences, and may refer to a gene naturally 
found within a plant cell but artificially introduced, particularly as part of a chimaeric construct In w^ich it is 
placed under the control of the tissue-specific regulatory sequences of the present Invention. The structural 
gene may be derived in whole or in part from a bacterial genome or episome, eukaryotic genomic or plastid 
DNA, cDNA, viral DNA, or chemically synthesized DNA. Such a stnjctural gene may contain modifications 

40 (including mutations. Insertions, deletions and substitutions) in the coding or the untranslated regions which 
could affect biological activity or the chemical structure of the expression product, the rate of expression or 
the manner of expression control. The structural gene may constitute an uninterrupted coding sequence, or it 
may include one or more introns. The structural gene can encode fusion protein so long as functionality is 
maintained In the joining of coding sequences. The structural gene can be a composite of segments derived 

45 from a plurality of sources. The structural gene can be a composite comprising signal or transit sequence from 
one gene and a sequence encoding a mature protein from another gene. For example, the structural gene can 
be a composite having the signal or transit sequence of an S gene and the coding region of another gene. 

The term cDNA is understood in the art to denote the single stranded complementary DNA copy made by 
action of reverse transcriptase on an mRNA template. Herein, the term cDNA is also used to denote any single 

50 or double stranded DNA that is replicated from this first complementary copy. cDNA coding sequences are 
distinguished from genomic DNA sequences by the potential presence of Intron non-coding sequences in the 
genomic DNA. In vivo , introns are removed from messenger RNA by splicing events that produce mature 
mRNA. It is mature mRNA that is used in the initial preparation of cDNA by reverse transcription. 
The term recombinant DNA molecule is used herein to distinguish DNA molecules, in which heterologous 

55 DNA sequences have been artificially ligated together by the techniques of genetic engineering, for example 
by in vitro ligation using DNA iigase Maniatis, T. etal . (1982) Molecular Cloning . Cold Spring Harbor Laboratory. 
Cold Spring Harbor. New York). Heterologous DNA sequences are derived from different genetic entities. 

The process of cloning a DNA fragment involves excision and isolation of the DNA fragment from Its natural 
source, Insertion of the DNA fragment Into a recombinant vector and Incorporation of the vector into a 

60 microorganism or cell where the vector and inserted DNA fragment are replicated during proliferation of the 
microorganism or cell. The term clone is used to designate an exact copy of a particular DNA fragment. The 
term is also used to designate both the microorganism or cell Into which heterologous DNA fragments are 
initially Inserted and the line of genetically identical organism or cells that are derived therefrom. 
The term recombinant vector Is used herein to designate a DNA molecule capable of autonomous 

65 replication In a host eukaryotic or prokaryotic cell, Into which heterologous DNA sequences can be inserted. 
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so that the heterologous sequences are replicated in the host cell. Conventional techniques known to those of 
ordinary skill In the art are used to Introduce the vector Into its host cell (Manlatis et al. . 1982. supra ). 
Recombinant vectors often contain a marker displaying a selectable phenotype such as antibiotic resistance 
to allow selection of transformed cells. 

A DNA molecule that is substantially pure will migrate as a single band In agarose or polyacrylamide gel 5 
electrophoresis, using conventional procedures described in Manlatis et al . (1982), supra , and exemplified In 
Figures 4, 6 and 7. 

The term homology is used in the art to describe a degree of amino acid or nucleotide sequence Identity 
between polypeptides or polynucleotides. The presence of sequence homology is often used to support a 
genetic or functional relationship between polypeptides or nucleotide sequences. The presence of amino acid 10 
sequence homology between polypeptides implies homology between the OHA sequences that encode the 
individual polypeptides. Since the genetic code is degenerate the degree of homology between polypeptides 
or proteins is not necessarily the same as that between the DNA sequences that encode them. The degree of 
homology between polypeptides or polynucleotides can be quantitatively determined as a percent homology If 
the sequences are known. In the absence of sequence information for comparison, the presence of homology IS 
is usually determined operationally by experiment. In the case of DNA or RNA sequences, hybridization 
experiments are used to determine the presence or absence of homology. Since the strength of a particular 
hybridization signal depends on the experimental conditions used as well as the degree of homology, it is 
convenient to define homology in relation to the experimental conditions used. We use the term substantially 
homologous as the degree of homology that must exist between the hybridization probe and a target RNA or 20 
DNA sequence in order to select the target sequence from a background of undesired sequences using 
hybridization experiments as described herein. 

Except as noted hereafter, standard techniques for cloning, DNA isolation, amplification and purification, for 
enzymatic reactions involving DNA ligase, DNA polymerase, restriction endonucleases and the like, and 
various separation techniques are those known and commonly employed by those skilled in the art. A number 25 
of standard techniques are described in: Maniatis et al . (1982). supra : Wu (ed.) (1979) Meth. Enzymol. 68; Wu 
et al . (eds.) 1983) Meth. Enzymol. 100 and 201 ; Grossman and Moldave (eds.) Meth. Enzymol. 65; Miller (ed.) 
(1972) Experiments in Molecular Genetics , Cold Spring Harbor Laboratory. Cold Spring Harbor, New York; Old 
and Primrose (1981) Principles of Gene Manipulation . University of California Press, Berkeley; Schleif and 
Wensink (1982) Practical Methods in Molecular Biology ; Glover (ed.) (1985) DNA Cloning Vol. I and II. IRL 30 
Press. Oxford, UK; Hemes and Higgins (eds.) (1985) Nucleic Acid Hybridization . IRL Press, Oxford, UK; Sellow 
and Hollaender (1979) Genetic Engineering: Principles and Methods . Vols. 1-4. Plenum Press. New York. 
Abbreviations and nomenclature, where employed, are deemed standard In the field and commonly used in 
professional journals such as those cited herein 

The present work describes the isolation and Identification of cDNA and genomic DNA encoding S-gene 35 
proteins of gametophytic self -incompatible plants particularly those encoding the S-genes of Nicotiana alata . 
The initial isolation of cDNA of S-genes, as applied to the S2-gene of Nicotiana alata . involved the preparation 
of a cDNA library from poly(A*) RNA of mature styles which was then differentially screened employing 
radioactively labelled cDNA from ovary and green bud style to remove non-mature style specific cDNA. The 
resulting mature style specific clones were then probed with an oligonucleotide probe specific for the desired 40 
S-gene. The specific probe was based on either the amino acid sequence of the S-proteIn or on the nucleotide 
sequence of a cDNA fragment produced from stylar mRNA by specific priming with mixed oligonucleotide 
primers which was based on the amino acid sequence of the S-protein. Alternatively, the specifically primed 
cDNA fragment can be used directly as a probe of the mature style clones. Screening of the mature style 
clones with an S-gene specific probe results in the Isolation of cDNA clones which contain S-gene coding 45 
sequences Including those which are full length and encode the entire S-protein and its atterTdant signal or 
transit sequence. In general, the procedure described above is applicable to the isolation of any gametophytic 
S-gene cDNA. 

The alternative methods for screening the mature style specific clone library to obtain S-gene cDNA require 
a knowledge of the amino acid sequence of the S-proteln. S-proteln Is made in minuscule amounts at limited SO 
times in limited tissue. Several hundred styles must be dissected from flowers in order to obtain sufficient pure 
-S-proteln for micro-amino acid sequencing. Consequently, the determination of S-proteln amino acid 
sequence requires significant time and effort. Alternative screening methods for isolating S-gene cDNA clones 
are therefore desirable, initially it was believed that there was enough structural similarity betweeh the S-gene 
coding regions, as indicted by hybridization experiments and N-termlnal amino acid sequencing, that the S5 
cDNA clone of one S-gene could be employed directly as a probe to Isolate cDNA clones of other S-genes. 
This was expected to be true partlculariy for S-alleles of the same or related plants. In practice It was found this 
direct screening method did not work in all cases. For example, screening of an Nicotiana alata Sa-Sa cDNA 
library with the N. alata S2 cDNA clone resulted In the isolation of §3 cDNA clones. In contrast, this method was 
not successfully for the isolation of N. alata Se or Si cDNA clones. 60 

A new screening procedure was developed for the isolation of the various S-alleles of Nicotiana alata . This 
procedure involves the differential screening of a mature style cDNA library with cDNA prepared from styles of 
the same genotype as the library and cDNA prepared from style RNA of another genotype. This procedure is 
partlculariy effective because RNA encoding the S-glycoprotelns is very abundant. The S-clones hybridize very 
strongly to cDNA prepared from RNA of the same genotype, while they hybridize weakly with cDNA from other es 
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genotypes. This procedure was specifically employed to Isolate N. alata §3 and Se-cDNA clones and is 
generally applicable to the Isolation of any N. alata S-gene cDNA. Further, the procedure Is applicable generally 
to the isolation of S-gene cDNA clones In other gametophytfc species if the variation In DNA sequence among 
the S-alleles in that species Is comparable to the DNA sequence differences among Nlcotlana alata S-alleles. 
5 This procedure is not expected to work for selecting S-alleles In the sporophytic system since there appears 
to be much higher homology (70-750/o) among the various S-alleles of Brassica . 

Once S-gene cDNA clones are Isolated they can be employed as hybridization probes of genomic DNA to 
locate and isolate genomic S-gene clones. This procedure has been used specifically to Isolate the S2-gene of 
Nicotiana alata . Including the S2-protein coding sequence and the 5' and 3' flanking regions of the gene. Within 
10 the upstream flanking region of the §2 gene a region having strong homology to mitochondrial DNA of 
gametophytlcally self-incompatible plants was Identified. This region functions In the regulation of tissue 
specific expression of the S gene. 

Isolation of cDNA encoding the 32K Sz-gene protein of N. alata 
15 A method for Isolating and purifying the S-gene associated glycoproteins from mature styles had been 
established using a combination of Ion exchange and affinity chromatography (U.S. Patent Application Serial 
Nos. 615,079 and 050,747). This method had been applied to the Isolation and purification of N. alata 
S2-proteln. More recently, purified protein yield Improvements have been obtained by using a less basic buffer 
(pH 7.0 rather than pH 7.8) In affinity chromatography. The S-protein appears to be more stable at lower pH. As 

20 illustrated In Fig. 1, It was possible to isolate a single component of MW 32K associated with the Sa-allele of 
Nicotiana alata . Chemical deglycosylation of this component yielded a single product of approximately 26 kd in 
molecular weight, shown In Figure 2a. The results of in vitro translation of mRNA from mature styles, green bud 
style and ovary are shown in Fig. 2b. Total RNA was Isolated by conventional methods. Since most mRNA is 
polyadenylated, poly(dT) cellulose chromatography was used to Isolate mRNA, as poly(A+) RNA. The various 

25 poly(A+) RNA fractions were translated using an amino acid depleted rabbit reticulocyte lysate kit (Amersham 
No. N,150, Artlngton i-ieights, 111.) in the presence of tritiated amino acids. An in vitro translation product of 
approximately 27 kd molecular weight was detected only from mature style mRNA. This product was slightly 
larger than the chemically deglycosylated protein. It was therefore identified as the full length immature 
S2-protein, which Is composed of mature Sa-protein and its signal peptide. 

30 Based on this finding, a protocol of differential screening was adopted as the initial part of the strategy to 
isolate cDNA coding for Sa-protein. A cDNA library was prepared in gt10 phage using mature style po!y(A+) 
RNA of N. alata genotype S2S3. Mature style polyCA"") RNA was transcribed into double stranded cDNA by 
conventional methods (Maniatis etal., 1982, supra ). End-repair, EcoRI methylation and EcoRI linker ligation 
reactions were carried out and the cDNA was cloned into the Eco RI site of the gt10 vector (Huynh, T. etal., 

35 (1985) In Practical Approaches In Biochemistry , DNA Cloning Vol. 1 ed. Glover, D. IRL Oxford, pp. 49-78). This 
library was subjected to differential screening using ^^p-iabelled cDNA from mature and green bud styles. The 
lambda-phage was used to infect Escherichia coii C600 cells. Plaques that hybridized strongly only to the 
mature style cDNA were selected and differentially screened a second time using 32p-iabelled cDNA prepared 
from either mature style or ovary mRNA. Again plaques that hybridized strongly only to the mature style cDNA 

40 were selected. Ovary cDNA was used in this second screen because SDS-gel electrophoresis indicated that 
extracts of mature style and ovary had some common proteins which were not expressed In green bud styles 
(Figure 3). Surprisingly, tissues other than ovary and green bud were found to be unsuitable sources of cDNA 
for differential screening since the protein profiles of other organs were found to be too diverse from that of 
mature style to be useful. Therefore, differential screening with ovary and green bud cDNA, although 

45 considerably less convenient, was necessary to discriminate mature style-specific cDNA. The resultant cDNA 
clones were specific for mature style. 

Once the cDNA mature style library had been differentially screened, a Sa-protein specific DNA probe was 
required for final screening of the clone library. The first step In the preparation of the probe was the 
determination of the N-terminal amino acid sequence of the N. alata §2 -protein (Table 1), Conventional 

50 microsequencing techniques were used (Hewick. R.M. et al . (1981) J. Biol. Chem 256:7990-7997). As a 
consequence of the limited availability of S-protein, only short segments of N-terminal sequence could be 
determined using conventional microsequencing techniques. Unfortunately, the N-terminal amino acid 
sequence of the S2-protein proved to have highly redundant coding oligonucleotide possibilities. 
Nevertheless, a partial-length cDNA was Isolated by the following procedure. A set of synthetic mixed 

55 oligonucleotide primers were prepared based on the partial amino acid sequence. A set of 24 14-mers, 
covering all the codon ambiguities at amino acids 4-8, was synthesized. These synthetic mixed 
oligonucleotides were then used in three batches of eight 14-mers each, to prime synthesis of cDNA from N. 
alata (S2S3) mature style poly(A-^) RNA. 
As shown in Figure 4, only one batch (No. 166) was found to be specific for the priming reaction. 

6(7 Surprisingly, a single cDNA band 100 nucleotides in length was Identified in this reaction. A 100 bp-nucleotlde 
band was also observed when the pooled 14-mers were used to prime poly(A+) RNA from mature styles; only 
traces of this fragment were detected in priming from ovary or green bud style mRNA, 

The 100 nucleotide long band was eluted from an acrylamlde gel and sequenced yielding the S2-proteln 
coding sequence from amino acid -12 in the signal sequence, up to amino acid 2 of the mature protein, Table. 2. 

65 From this sequence a single 30-mer was synthesized which covered the part of the signal sequence to -9 and 



8 



EP 0 343 947 A2 



included the first amino acid codon of the coding sequence (Table 2). This amino acid region was chosen in 
order to insure that the synthetic probe would Identify cDNA clones that extended into the signal sequence 
codons. This strategy was adopted for convenience, since adequately large amounts of the synthetic probe 
could be prepared In a single synthesis. Alternatively, the 100 bp fragment could have been cloned, amplified, 
purified and radioactively labelled for use as a probe. 5 

The 30-mer was used as an S2-protein specific probe to screen the mature style-specific clones previously 
identified by differential screening. One of the clones obtained was chosen for further study. The clone, 
designated NA-2-1. contained a cDNA insert of 877 bp which could be excised as a single fragment from the • 
lambda vector by Eco RI digestion. The 877bp Insert has been cloned Into Ml 3 phage (ri/I13mp8) and was 
deposited with the American Type Culture Collection, 12301 Pari<Iawn Drive. Rockville. Maryland 20852 10 
(Accession No. 40201). 

In sequencing the NA-2-1 insert It was found that It did not extend in the 5' direction to an ATG Initiation 
codon, and so did not contain the full signal sequence. A full-length clone was obtained from a second cDNA 
library which had been prepared using a method. (Okayama etal. (1982) Mol. Cell Biol. 2:161-170) which 
optimizes the recovery of full length clones. This library was screened with the 30-mer probe as well as with the is 
cDNA insert from clone NA-2-1 (described above). A clone designated NA-2-2 was obtained which hybridized 
to both probes. Table 3 provides the nucleotide sequence of the cDNA insert from NA-2-2. The NA-2-2 clone 
insert was subcloned into M13 phage (M13mp8), designated pAEC9. and was deposited with the American 
Type Culture Collection, 12301 Parl<lawn Drive. Rockville. Maryland 20852 on April 16, 1986. and has been given 
Accession No. 40233. 20 

The sequence of the full length cDNA Insert of clone NA-2-2 (Table 3) includes an ATQ at its 5' end that is a 
potential initiation codon. The sequence contains an open reading frame of 642 bp which encodes a protein 
with a predicted molecular weight of 24.847 that includes a putative signal sequence of 22 amino acids. Table 8 
provides the amino acid abbreviations used In the Tables of sequences. The sequence of Table 3 encodes the 
mature S2-protein (192 amino acids) with a signal sequence that would direct the extracellular transfer of the 25 
§2 glycoprotein from the transmitting tract cells. The full-length signal sequence has the typical features 
described for eukaryotic signal sequences (von Heljne (1983) Eur. J. Biochem 133:17-21; and von Heljne 
(1985) J. Mol Biol. 184:99-105). 

The initially Isolated NA-2-1 S2 cDNA clone contained the entire S2-proteln coding region, part of the signal 
sequence, and a poly(A*) tail 18 residues long. Differences In the sequence of the NA-2-1 cDNA clone and that 30 
of the full-length clone are indicated in Table 3. Apart from the differences at the 5' end, clones NA-2-1 and 
NA-2-2 also differ In the length of their 3' untranslated sequence. They are identical to nucleotide 682, which is 
the polyadenylation site in clone NA-2-2. The clone insert from NA-2-1 has an additional 50 nucleotides of 
untranslated mRNA and a polyA tail of 18 residues. This difference at the 3' end suggests that there are 
alternative polyadenylation sites in S2 RNA transcripts. 35 

It will be obvious to one of ordinary skill in the art that the DNA sequence Information provided herein can be 
used for the chemical synthesis of oligonucleotide probes that can be used in the hybridization screens 
described herein. See. for example, Caruthers. M.H. (1984) Contemp. Top, Polym. Sci. 5:55-71 ; Elsenbels, S.J. 
et al. (1985) Proc. Natl. Acad. Sci. USA 82:1084-1088. 

Hybridization of the N. alata S2 protein cDNA clone to poly (A^ ) RNA from mature styles of N. alata. L. 
peruvianum and Brassica oleracea 

A d2p.|abelled copy of the cDNA insert from the NA-2-1 clone, which contains the S2-protein coding region, 
was used in Northern blot hybridization experiments with poly(A+) RNA prepared from mature styles of N. alata 
genotypes S1S3. S2S3, S2S2 and S3S3. as well as mature styles of L. peruvianum genotype S1S2, and green 45 
bud styles and ovaries of N. alata genotype S1S2. Figure 5. The size of the major transcript in mature styles 
bearing the S2- allele was 940 bases, based on comparison to 5' end labelled-Hind II l-EcoRI markers, with two 
minor transcripts at 1500 and 3500 bp. The 940 base transcript was also present in RNA from S3S3 and S1S3 
styles but at a much reduced frequency, that is 10/0 or less than the level In S2S2 or S2S3 styles. The major 
transcript was not present in green bud RNA but was detected in RNA from ovaries of mature flowers, again at SO 
a much lower concentration than that of mature styles (less than I0/0). 

Lycopersicon peruvianum genotype S1S3 contains readily detectable levels of a 2.5 kb mRNA that 
hybridizes with the NA-2-1 cDNA Insert. The Si and S3 proteins from L. peruvianum both have estimated 
molecular weights of 28 kd; the RNA blot analysis indicates that the mRNA transcripts encoding these 
proteins are Identical in size. Hybridization with Brassica oleracea mature style mRNA was faint under the 55 
conditions, used. 

These results indicate homology between the DNA coding sequences of the N. alata Si and §3 proteins and 
the S2 protein of N. alata . Further, they Indicate that there is homology between the coding sequences of the 
N. alata §2 protein and those of Lycopersicon peruvianum Si and §3 protein. The origin of the weak 
hybridization of the S2-protein cDNA probe to poIy(A+) RNA from B. oleracea is unclear since there is no 60 
homology between the cloned S-alleles of Nicotlana alata and those of Brassica . 

Isolation of cDNA clones of Nicotlana alata S-alieles 

Although hybridization experiments had initially Indicated that the Nicotlana alata S2-gene cDNA could be 
used In direct hybridization screening to obtain cDNA clones of other S-alleles of N. alata . this method was 65 
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found not to be generally successful. Northern analysis had shown that the §2 cDNA clone Insert (NA-2-1 or 
NA-2-2) cross hybridized with §3 mRNA, but the degree of hybridization was about 100 fold lower than that 
obtained with the §2 cDNA probe on §2 mRNA. While §3 cDNA clones were obtained by direct screening of a 
mature style specific S3S3 cDNA library with the §2 probe, they were not strongly hybridizing plaques. Once 

5 S-cDNA clones of other N. alata S-^enes were Isolated (see below), it was found that the various S-alleles have 
only about 550/o overall homology at the DNA level. The substantial homology between the N. alata S-proteins 
was confined to the N-termlnal region of the protein (Table 1). 

A different screening approach based on the structural differences among the N. alata S-alleles was then 
devised to Isolate N. alata S-allele cDNA. and was applied specifically to the isolation of N. alata §3 and Ss 

10 cDNA. 

A cDNA library was prepared in gt10 using mRNA from mature styles of genotype S3S3. Radloactively 
labelled cDNA was prepared from mature styles of the S3S3 genotype and the SeSe genotype/ The cDNA 
library was then differentially screened employing the labelled cDNA from the different genotypes. Plaques 
that hybridized strongly to S3S3 cDNA and weai<ly to SeSe cDNA were selected and rescreened with the §2 
IS cDNA clone. The resulting clones were then used as probes in northern blots containing RNA from several S 
genotypes. §3 cDNA clones were those tiiat hybridized most strongly to the RNA from styles which carries the 
§3 allele. Hybridization of the §3 clones to RNA of genotypes which did not carry the §3 allele was significantly 
weaker (10-100 fold lower) . One of the §3 clones was selected for sequencing and its sequence is presented in 
Table 4. This clone was nearly full length; however, a short subfragment at the 5' end of the clone was 
20 inadvertently cleaved when the clone was sequenced. The sequence 5' to the Eco RI site (indicated In Table 4) 
has been determined by RNA sequencing. The N-termlnal amino acid sequence of the mature §3 protein was 
obtained by microsequencing analysis. The signal sequence has not yet been obtained. 

An analogous procedure was employed to Isolate Ss cDNA clones from a mature style library of the SeSe 
genotype. Initial selection was made for clones which strongly hybridized to SeSe cDNA and weakly hybridized 
25 to S3S3 cDNA. One of the Se clones was selected for sequencing and its sequence is presented in Table 5. 
This clone contains the entire Se protein coding sequence and a portion of the signal sequence. The clone 
does not extend in the 3' direction to a poly(A) tail. 

In general, analogous differential screening procedures can be applied to the isolation of cDNA clones of 
other S alleles of Nicotiana alata . 

30 

Isolation of a chromosomal S-gene using an S-ailele specific cDNA clone as a hybridization probe 

DNA can be isolated from a self-incompatible plant of known S genotype by conventional methods, as for 
example those described by Rivin, 0. J. et al . (1982) in Maize for Biological Research (W. F. Sheridan, ed.) 
pp. 161-164, Plant Moi. Biol. Assn. Chariottesville, Virginia; and Mazure, B. J. and Chui, C.-F. (1985), and 

35 Bernatzky and Tanksiey (1986) Theor. Appl. Genet. 72:314-321. A genomic DNA library can then be 
constructed in an appropriate vector. This Involves cleaving the genomic DNA with a restriction endonuclease, 
size selecting DNA fragments and inserting these fragments into a cloning site of the chosen vector. A 
description of the construction, for example, of a Nicotiana tabacum genomic library In the phage lambda has 
been given by Mazure, B. J. and Chui, C.-F.. 1985, supra . 

40 Genomic S-allele clones are selected by screening the genotype specific genomic library with a 
radioactively labelled cDNA S-allele clone insert hybridization probe, for example in a filter hybridization 
screen. An appropriate microorganism Is infected with the phage lambda containing the genomic library. The 
Infected organisms can be plated on agarose at a concentration of several thousand plaque forming 
units/plate and replicated onto nitrocellulose filters. The labelled probe can then be applied to the filter and 

45 allowed to hybridize. Plaques that show hybridization to the probe are selected, replated and rehybridized until 
a pure phage is isolated. DNA from selected phage can then be purified, restricted, separated on agarose gels 
and transferred by blotting to nitrocellulose filters. These filters can then be reprobed with the labelled cDNA 
S-aliele probe to Identity those restriction fragments that contain S-proteIn coding sequences. Standard 
hybridization conditions for such screens have been described (Manlatis et al ., 1982. supra ). 

50 This procedure was specifically applied to the Isolation of the chromosomal S2 gene of Nicotiana aiata Total 
DNA was isolated from leaves of plants of the S2S2 genotype. In Southern blot hybridization experiments it 
was established that labelled §2 cDNA probe (NA-2-1 or NA-2-2) hybridized to a single approximately 3.1 kb 
fragment generated by Eco RI digestion of .S2S2 genomic DNA. This fragment was cloned into gt10. The 
chromosomal S2 gene was then sequenced using the dideoxy method. The sequence of the genomic S2 gene 

55 is provided In Table 6. As shown, the §2 coding sequence (nucleotides 1603 - 2338) is Interrupted by a single, 
94 bp Intron. The transcription start has been mapped, as indicated, to a position 19 bases upstream (at 
position 1584) of the ATG start codon. The sequence includes 5' regulatory sequences extending 1583 bp 
upstream of the transcription start and contains sequences required for regulated expression of the Se gene 
product in reproductive tissue. A putative "TATA" box is identified at nucleotides 1549-1559. The sequence 

60 also includes the two poiyadenytatlon signals identified at the 3' ends of the 82 cDNA clones: Ti(2410 - 2415) 
and T2 (2456 - 2461). 

A segment has been identified within the upstream region of the §2 gene the shows homology with 
mitochondria! DNA on Southern blots. The 3.1 kb §2 gene Eco RI fragment was digested with Hindi and an 
approximately 1 kb fragment which extends from 354 bp upstream of the coding region was Isolated and used 
65 as a probe in Southern blots of Hindlll digests of total DNA from N. alata and Lycoperslcon esculentum . This 
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probe produced a highly repeated pattern Including a band of about 750 bp on N.alata but only one major band 
of about 750 bp on L. esculentum Figure 6A. Subsequent hybridizations with DNA from L. esculentum and the 
related L. pennellii , that had been digested with 12 different enzymes revealed no polymorphism of the probe 
sequence. The 1 kb fragment was also used In Southern blots to probe mitochondrial DNA Hindlil digests of N. 
alata and L esculentum . Figure 6A. The homologous segment Is clearly demonstrated in both species to be In 5 
themitochondrial DNA. Further experiments Indicated that the homologous sequence Is Integrated into the 
high molecular weight chromosomal DNA and not In an extrachromosomal element. The 750 bp mitochondrial 
DNA fragment of N. alata that hybridized to the 1.0 kb Hindi fragment was then Isolated and used as a probe 
on Southern blots of Hlndlli digests of total and mitochondrial DNA of both species (Figure 6B). The 
mitochondrial DNA probe hybridized to a single fragment of in total and mitochondrial DNA of both species. 10 
This Indicates that the sequence responsible for the repeated hybridization pattern on total DNA of N. alata 
(Figure 6A) and the sequence that Is homologous to mitochondrial DNA are separate elements on the 1.0 kb 
subfragment of the §2 gene genomic clone. The 750 bp mitochondrial DNA fragment of N. alata was found not 
to hybridize to mitochondrial DNA of maize under moderate stringency hybridization conditions. 

The region of N. alata DNA that Is homologous to the 1 .0 kb §2 gene fragment was found to be confined to a 15 
315 bp Hindlll/HincH subfragment of the 750 bp mitochondrial DNA fragment. This subfragment was 
sequenced and its sequence was compared to that of the upstream region of the Sa gene (Table 7). Alignment 
of the mitochondrial and nuclear sequence revealed a 56 bp segment of very high homology (53/56 bp). The 
position of this homologous region in the §2 gene sequence Is indicated In Table 6. There are two additional 
short, perfectly matched sequences 3' from the 56 bp segment (underlined in Tables 6 and 7) which occur In ^ 
both the mitochondrial and nuclear DNA. The spacing of these two sequences Is different In the nuclear and 
mitochondrial DNA fragments. The nuclear sequence also contains a short 8 bp direct repeat that immediately 
flanks the region of homology (one of the repeats is within the homologous sequence). The first 7 bp of the 
repeat perfectly match the terminal portion of the inverted repeat of the S-2 plasmid of maize that is found In 
the mitochondria of S male-sterile cytoplasm (Levings and Sederoff (1983) Proc. Natl. Acad. Scl. USA 25 
80:4055-4059). The presence of direct repeats in the nuclear sequence are consistent with features of 
transposable element excision (Nevers et al . (1986) Adv. Bot. Res. 12:103-203). The similarities of sequence 
between the nuclear and mitochondrial DNA segments of Table 7 and the presence of transposable element 
features suggest that the homologous region has been transferred between organelles, however the direction 
of transfer is unknown. A comparison of the 56 bp and the entire 315 bp mitochondrial segment with the plant, 30 
organelle, viral and structural DNA sequences compiled in the GenBank database (U.S. Department of Health 
and Human Services. Theoretical Biology and Biophysics Group, Los Alamos Natl. Laboratory, Los Alamos, 
New Mexico) reveals no significant sequence homologies. 

When Southern blots of total DNA digests of N. alata . L. esculentum and L. pennellil are probed with the 750 
bp mitochondrial clone, hybridization to other fragments is observed after long exposures of the blots to film 35 
(Figure 7A). These results indicate that the mitochondrial clone hybridizes to other regions of nuclear DNA. 
This is also supported by the results of an analogous hybridization experiment in which total DNA digests of six 
F2 progeny of a cross between L. esculentum X L. penneilii were probed (Figure 7B). Since all of the progeny 
have the same cytoplasm, the differences In patterns between the individual progeny is most likely due to 
segregation of nuclear fragments. ^ 

The presence of the mitochondrial homologous region within the upstream region of the §2 gene Indicates 
that it has a function in the regulation of expression of that gene. The presence of the homolog in mitochondrial 
DNA could indicate the presence of a similarly regulated cytoplasmic gene associated with the mechanism of 
gametophytic self-incompatibility. Although a cytoplasmic component is not usually associated with 
self-incompatibility, there are certain aberrations of the system such as the generation of new allelic 45 
specificities that appear first In the stylar (maternal) tissue that might be explained by such a cytoplasmic 
component. 

Synthesis of S-protein in heterologous in vivo expression systems 

The S-protein DNA coding sequences whose Isolation Is described herein can be used to direct synthesis of 50 
significant amounts of active S-protein. 

The DNA encoding the S-proteIn can be Inserted Into a recombinant vector so that it is under the control of 
its own regulatory sequences, an endogenous regulatory region of the vector or an inserted regulatory region 
by conventional recombinant DNA techniques. The choice of recombinant vector is not crucial. A partial list of 
vectors includes lambda or M13 bacteriophage, Tl or Ri-plasmids of Agrobacterium , pBR322 derived plasmids. 55 
and plant viral vectors such as brome mosaic virus (BMV) or tobacco mosaic virus (TMV) . An appropriate host 
microorganism or plant cell is then transformed with the vector containing S-proteIn coding sequences. 
Transformed organisms or cells are selected by conventional means and assayed for the expression of active 
S-protein, for example as in an in vitro pollen tube Inhibition assay or by immunoassay. Transformants which 
produce active protein can then be grown In liquid medium for an appropriate time to allow synthesis of 60 
S-proteln which Is then Isolated and subject to further purification, if necessary. S-proteIn sequences can be 
maintained on the vector or Integrated into the chromosomal DNA of the host, where the S-protein sequences 
will be flanked by DNA sequences of the host. 

Yeast expression systems are particulariy useful for the expression of plant proteins since correct post- 
translational processing of plant proteins has been observed in such systems. Detailed descriptions of. the 65 
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expression of plant proteins in yeast are given in Rotlistein, S.J. etal. (1984) Nature 308:662-665; Langridge, P. 
etaJ. (1984) EMBO J. 3:2467-2471 ; Edens, L. et al., 1984. supra : and Cramer, J A. et al. (1985) Proc. Natl. Acad. 

Sci. ^:334-338. 

Alternatively, plant proteins can be expressed using similar teciinlques in bacteria as exemplified in Edens. 
5 L. et al . (1982) Gene 18:1-12. wlilcii described the expression of the plant protein thaumatin in Escherichia coll . 
When a bacterial system is employed, the DNA encoding the §-protein should be free of Introns. as will be the 
case with cDNA. 

While the presence of a complete signal sequence is not essential to obtain expression of active protein in 
either yeast or bacteria, more efficient protein synthesis has been observed In yeast when signal sequences 
70 are present (Edens, L. et al ., 1984, supra) . 

Regulated expression of proteins in reproductive tissue of self-incompatible plants 

>n situ hybridization experiments in N. alata described in Cornish et al . (1987) Nature 326:99-102 have 
established that the gene encoding the S-proteIn is expressed throughout the female secretory tissue, the 

15 stigma, style transmitting tissue and the epidermis of the placenta. More recently, we have found in similar in 
situ hybridization experiments of pollen and anther sections that the S-genes of N. alata are expressed in 
pollen. The 5' non-coding regions of the S-genes thus contain regulatory sequences which direct expression 
of downstream coding sequences In reproductive tissue of self-incompatible plants. These regulatory 
sequences can be employed to selectively express a desired protein in plant reproductive tissue. Selective 

20 expression can be accomplished by the construction of chimaeric genes In which a desired structural gene is 
placed under the regulatory control of the S-gene regulatory sequences. Such chimaeric genes can then be 
Introduced into plant cells or tissue regenerable Into whole plants, where the desired structural gene is 
selectively expressed In reproductive tissue. 

25 Example 1: Sources of Plant Materials 

Seeds of heterozygous genotypes S2S3 and S1S3 of N. alata were provided by Dr. K.K. Pandey (Grasslanas, 
Palmerston North, New Zealand) and genotype SpSr was a gift of Dr. G. Breidemeijer (Stichting Ital., 
Wageningen, The Netheriands). L. peruvlanum heterozygous genotypes S1S2 and S1S3 were obtained from 
the Victoria State Department of Agriculture, Burnley. Victoria, Australia. Plants homozygous for the §2-, §3- 

30 and Se-alleles were generated by bud self-pollination as described in U.S. Patent Application Serial 
Nos. 615,079 and 050.747. Briefly, buds generated from N. alata heterozygous plants were emasculated at the 
elongated bud stage by carefully slitting the corolla with fine forceps and gently removing the Immature 
anthers. Twenty-four hours after emasculation, just prior to the development of petal coloration, the Immature 
stigma were pollinated with self pollen from a mature dehisced anther of another fiower. Prior to pollination, the 

35 stigma surface was coated with either (1) exudate from a mature stigma (applied by gently touching the two 
stigma together) or (ii) 150/0 sucrose in O.OOIO/0 borate (applied by carefully touching the stigma to a drop of 
solution). After this treatment, stigma were pollinated by gently touching them into a glass Petri dish 
containing mature pollen or by carefully brushing pollen onto the stigma surface. To prevent premature flower 
drop the flower axis was smeared with a little 10/0 (w/w) indole acetic acid in raw lanoline. The genotypes of F1 

40 progeny of bud-pollinated plants were established by test crossing against plants of known self-incompatibility 
genotype. 

B. oleracea mixed genotype, L. esculentum (tomato) cv. Grosse-Lisse and L. penneiiii (I-A716) (a wild 
relative of tomato which was obtained from CM. Rick, University of California, Davis, CA) were employed in 
hybridization experiments. 

45 Mature non-pollinated styles were obtained from flowers that had been emasculated at the onset of petai 
coloration or from yellow buds. These mature styles were removed and used Immediately or stored at -70''C. 
Styles refer to stigmas and style which were excised together. Ovary was separated from styles. Green bud 
styles refer to Immature styles before the onset of self-incompatlbllity. 

50 Example 2: Purification of 32K S2-protein from NIcotiana alata styles 

Flowers from N. alata (genotype S2S3) were emasculated at the onset of petal coloration. Two days later, the 
fully mature styles were removed and stored at -70*^0. (Styles refer to the style and stigma which were 
removed together; ovary is not included.) Frozen styles (3^) were ground to a fine powder In liquid nitrogen 
using a mortar and pestle; this was followed by further grinding In 50 ml of extracting buffer (50 mm Tris-HCI, 

55 pH 8.5. 1 mM CaCI2, 20 mM NaCi, 1 mM DTT, 10 mM EDTA and 1 <Vb (w/w) insoluble poiyvlnylpyroilldone. The 
homogenate was centrifuged (12,000 g; 15 minutes) and the supernatant (11 ml) was collected. 

Prior to ion exchange chromatography the style extract (11 ml) was equilibrated with NH4HCO3 (5 mM, pH 
8.6), NaCi (1 mM). CaCl2 (1 mM), EDTA (1 mM) by passage through a Sephadex G-25 (Trademari<, Pharmacia 
inc.. Uppsala, Sweden) column (1.6 cm diameter; 22 cm long, void volume 11 ml). The first 16 ml eluted after 

60 the void volume was collected and applied to DEAE-Sepharose (Trademark. Pharmacia Inc., Uppsala, Sweden) 
(bed volume 26 ml. 1.6 cm diameter x 13 cm long) which was equilibrated with the same ammonium 
bicarbonate buffer. The column was then washed with this buffer (50 ml) before the application of a NaCI 
gradient (0-0.5 M). The Ss-protein was present In the unbound fractions which were combined and 
concentrated to a final volume of 16 ml by rotary evaporation at room temperature. The S2-proteln was further 

65 purified by affinity chromatography using ConA-Sepharose (Trademark, Pharmacia Inc. Uppsala. Sweden) 
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followed by gel filtration. ConA-Sepharose was washed with 5 volumes of methyl-ct-D-mannoslde (0.1 M) in 
bufferrsodlum acetate (10 mM. pH 7), 0.1 M NaQ. 1 mM MgCla. 1 mM CaCl2, 1 mM MnCl2. The washed 
ConA-Sepharose was then transferred to bicarlwnate buffer, NaHCOs (0.25 M. pH 8.8) for 1 hour at room 
temperature; the bicarbonate buffer was changed 4 times during the 1 hour period. Four volumes of NaHCOa 
(0.25 M, pH 8.8) containing 0.030/o (v/v) glutaraldehyde were added and the ConA-Sepharose was then s 
washed with NaHCOa (0.1 M, pH 8.0), containing 0.5M NaO, resuspended In buffer: sodium acetate (10 mM, 
pH 7). 0.1 M NaCI. 1 mM MgGl2, 1 mM CaCl2. 1 mM MnCl2 and packed Into a column (0.8 cm diameter. 14 cm 
long). The unbound fraction from DEAE-Sepharose was equifibrated In 10 mM acetate buffer, by passing 
through a G25-Sephadex column equilibrated with 10 mM acetate buffer, then applied to the column. Unbound 
material was collected, the column washed with 10 volumes of acetate buffer, and the bound material eluted 10 
with 0.1 M or 0.2 M methyl-a-D-mannoslde In acetate buffer. Fractions containing Sa-protein were Identified by 
sodium dodecyl sulfate polyacrylamlde gel electrophoresis (SDS-PAGE), collected and concentrated to 1 ml 
by rotary evaporation. The use of a lower pH buffer represents an improvement over the method described in 
U.S. Patent Application 615.079. and results In improved yields of purified S2-proteln. The protein appears to 
be more stable at lower pH. 

The pooled fraction eluted by 0.1 M methyl-a-D- mannoside was applied to a column of Blogel PI 50 
(Trademark, Biorad l-aboratories, Richmond, California) to separate the methyl-a-D-mannoside from the 
S2-protein. (Void volume 13 ml. 1.6 cm diameter, 36.5 cm long equilibrated and run In NH4HCO3 (lOmM, 
pH8.5). 10 mM EDTA. 0.1 M NaCI 1 mM CaCl2. A further passage through Blogel P2 (Trademark. Biorad 
Laboratories. Richmond, California) In water was used to remove any trace of methyl-a-D-mannoslde. The 20 
purified S2-protein was essentially homogenous by the criteria of SDS-PAGE (Figure 2a). 

SDS-PAGE was performed according to Laemli. U.K. and Favre, M. (1973) J. Mol Biol. 80:575-583. using 
12.50/0 (w/v) acrylamide. Samples were reduced In 1.43 M 2 mercaptoethanol in sample buffer with heating 
for 2 minutes in a boiling water bath. After electrophoresis, gels were stained with Coomassie Blue. . 

25 

Example 3: N-terminal amino acid sequence of the N. alata S2-proteln 

N-termlnal sequencing was performed using an Applied Blosystems (Pfungstadt, West Germany) Model 
470A gas phase sequencer. Approximately 200 ^ig of purified Sa-glycoprotetn was applied in aqueous solution 
to a glass fibre disc and evaporated to dryness. The disc was placed In the reaction cell of the sequencer, the 
protein was eluted and then subjected to 20 cycles of automated Edman degradation by phenylisothiocyanate 30 
procedure. The resultant amino acid phenylthiohydantoln derivatives were identified by HPLC techniques on 
an IBM-CN column (IBM. Danbury. Connecticut) at 32°C using a sodium acetate-acetonitrile gradient, 20 mM 
sodium acetate (pH 6-5.6) varying from 100o/o-650/o (v/v) over 30 minutes. The identity of derivates was 
confirmed by comparison to known standard reference compounds. ^ 

Example 4: Comparison of the deglycosylated S2 genotype associated style glycoprotein with the In vitro 
translation products of style and ovary poly(A*) RNA 

Frozen mature styles of Nicotiana aiata (S2S3 genotype) were ground to a fine powder in liquid nitrogen 
using a mortar and pestle. Protein was extracted from this tissue and the Sa-allele associated glycoprotein was 
Isolated by a combination of ion-exchange and affinity chromatography (U.S. Patent Application Serial 40 
Nos. 615.079 and 050.747). This material was deglycosylated using a trifiuoromethane sulphonlc acid (TFMS) 
procedure modified for use with small quantities of protein (Edge etal. (1981) Annal. Blochem. 118:131-137). 

Purified S2-assoclated glycoprotein (200 ng) was lyophlllzed In a 10 ml glass tube with Teflon-lined screw 
cap and dried over P2O5 in a dessicator for 18 hours. Anisole (60 jil) and TFMS (120 ^1) were added and the 
tube was flushed with N2 for 30 seconds and sealed. After 90 minutes at 25** C, 10 ml of a 1:9 mixture of 45 
n-hexane:diethyl ether, preceded on dry ice. was added. The solution was placed on dry Ice for 60 minutes, 
centrifuged (500 g, 5 minutes, 4°C) and the supernatant discarded. The pellet was air-dried, resuspended in 
buffer (300 ^il) and the pH was adjusted to 6.8 by addition of pyridine :H20 (1 :1). The sample was boiled for 2 
minutes before eiectrophoresis. 

Total RNA was Isolated from ovary, green bud style or mature style by conventional methods using SO 
guanidinium thiocyanate as a protein denaturant. 01lgo(dT)-ceIluIose chromatography was used to isolate 
mRNA which is polyadenylated. poly(A*) RNA. This poly( A+) RNA (2.0 or 0.5 jig) was translated using an amino 
acid depleted rabbit reticulocyte lysate kit (Amersham, Arlington Heights, Illinois) In the presence of 150 mM 
K*. 1.2 mM Mg2+ and tritiated amino acids. Leucine, lysine, phenylalanine, proline and tyrosine were used at 
specific activities of 5.4, 3.1. 4.8, 3.8 and 4.0 TBq/mmol, respectively. The reaction volume was 25 ^U. After 55 
incubation for 90 minutes at 30*'C. RNA was removed by treatment with bovine pancreatic ribonuclease (5 til, 2 
mg/ml) for 20 minutes at 37^*. 

The glycosylated and deglycosylated samples of pure S2-alleIe protein were analyzed by SDS-polyacryla- 
mide gel electrophoresis (SDS-PAGE) using 150/0 acrylamide. The gels were stained with Coomassie Blue. 

Similarly, the translation products of mature style poly(A'^) RNA were separated by SDS-PAGE using 60 
10-150/0 acrylamide gradient gels. The products were visualized after treatment of the gel with Amplify 
(Trademark. Amersham. /Vrlington Heights. Illinois) and exposure to X-ray film. In both cases, molecular weight 
markers were Included In adjacent lanes and visualized with Coomassie Blue. 
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Example 5: Preparation of a cDNA library in bacteriophage gt10 

Poly(A+) RNA was Isolated from mature styles of N. alata (genotype S2S3) as described above and 
transcribed Into double stranded cDNA (Manlatis etat .. 1982, supra ). Blunt-ended cDNA was prepared by end 
repair with DNA polymerase. EcoRI sites contained In the cDNA were blocked by treatment with Eco RI 
5 methylase. Synthetic Eco RI lini<ers were then llgated to the double stranded cDNA. The cDNA was then cloned 
Into the EcoR I site of gt10 as described by (Huynh. et aL , 1985, supra ). This phage was used to Infect 
Escherichia coil C600 and plated. 

Example 6: Differential screening of mature style cDNA library 
10 Poly(A*) RNA was isolated from mature style, green bud style or ovary of N. alata genotype S2S3. Single 

stranded s^P-labelied cDNA hybridization probes were prepared by random priming from the individual RNA. 

l-ambda gt1 0 containing the mature style library was used to Infect E. coil C600 and plated at a density of about 

1000 plaque forming units/150 mm Petri plate. Duplicate nitrocellulose lifts were prepared for hybridization 

(Manlatis etal,, 1982, supra) . The plaques were first screened wrfth labelled cDNA probe from mature style and 
IS green bud style. Plaques that hybridized strongly only to the mature style probe were selected, picked, purified 

and subjected to a second differential screening using the probes to mature style and ovary. The resultant 

plaques represent mature style specific clones. 
In these plaque hybridizations, the filters were treated prior to hybridization (prehybridized) for 2 hours and 

during hybridization for 16 hours at 42**C with 5 X Denhardt's solution. 5 X SSC (3 M NaCI, 0.3M Trisodium 
20 citrate), 50 g/ml sonicated salmon spemi DNA. 60 mM sodium phosphate (pH 6.8), 1 mM sodium 

pyrophosphate, 100 pM ATP and 50Vo deionlzed formamide. Probes were used at a specific activity of 4 x 10^ 

cpm/ml. Filters were washed In a 0.1 X SSC solution containing 0.10/0 SOS (sodium dodecyl sulfate) at 42*'C. 

Example 7: Isolation of the cDNA clones specific for the Sa-alleie associated protein 

^ A set of 24 14-mer oligonucleotides was synthesized corresponding to all possible codon ambiguities at 
amino acids 4-8 In the N-termlnal sequence of the S2-proteln (Table 1). Oligonucleotides were synthesized by 
the solid-phase phosphoramldite methodology (Beaucage and Caruthers, (1981) Tetrahedron Letters 
22:1859) using an Applied Blosystems (Pfungstadt. West Gemiany) ABI Mode! 380A DNA synthesizer. The 
14-mers were end labelled using T4 kinase in the presence of ^^P-AJP (5000 Cl/mmol). These labelled 14-mers 

30 (5 tig/ml) were used in three batches of 8 14-mers to prime selective cDNA synthesis using mature style 
poly(A+) RNA. Reverse transcription reaction volume was 40 jil. The reaction contained 0.75 mM of dCTP, 
dGTP. dTTP and dATP, 75 |xg/ml poIy(A+) RNA, 50 mM Tris-HCI (pH 8.3), 10 mM KCI, 8 mM, MgC12. 0.4 mM 
dithiothreitol. 500 U/ml placental RNAase inhibitor and 500 U/mi AiyW reverse transcriptase. After incubation 
at 42° C for 90 minutes, the reactions were stopped by addition of EDTA to 50 mM. extracted with 

35 phenol ichloroform 1 :1 (v/v) and the product, labelled cDNA, was precipitated with ethanoi. The pellets were 
resuspended in 20 p.1 of a solution of 100 mM NaOH, 7M urea, and 10 mM EDTA. Samples were heated at 90°C 
for 5 minutes before electrophoresis on an 80/0 (w/v) acrylamide/7 M urea gel. The gel was exposed to X-ray 
film for 5 minutes, to locate specifically primed cDNA products. 

As shown in Figure 4, one of the batches of synthetic 14-mers primed synthesis of a 100 bp nucleotide 

40 specific for mature style. This 100 bp nucleotide cDNA band was excised from the gel and eluted overnight 
with shaking at 37** C in 0.5M ammonium acetate and 1 mM EDTA. The elutant was concentrated by butanol 
extraction, phenol xhioroform extracted and ethanoi precipitated. The 100 bp nucleotide was then sequenced 
using the technique of Maxam and Gilbert (1977), Proc. Natl. Acad. Sci. 74:560. The sequence of this 
nucleotide corresponded to the -12 to +8 amino acid of the S2-proteln Is shown in Table 2. 

45 A 30 bp-long synthetic oligonucleotide probe based on the sequence of the 100 bp cDNA and covering the 
region -8 to -I- 1 of the corresponding amino acid sequence was prepared as described above. The 30-mer 
probe was end-labelled with 32p»ATP. This probe was then used to screen the mature style specific clones 
obtained by differential screening of the gt10 library. The hybridization of the 32p-iabelled oligomer probe (4 x 
10^ cpm/ml) was done as described above except that the formamide concentration was decreased to 200/o 

50 and the temperature was decreased to 37° C. Filters were washed using 2 x SSC at 37" C. Approximately 
100.000 plaques from two separately prepared libraries were screened yielding 5 clones that strongly 
hybridized with the 30-mer probe. One gtIO clone, designated NA-2-1. was selected for further study. This 
clone was found to contain a single 877 bp insert which could be excised from the lambda vector by Eco RI 
digestion. After sequencing of the NA-2-1 clone, it was found that an error had been made In reading the 

55 sequencing gel of the lOObp fragment. The sequence shown in Table 2 was used to prepare the 30-mer probe. 
The sequence of the 30-mer probe that was used in screening did not therefore exactly correspond to the 
NA-2-1 clone insert. 

Example 8: Nucleotide sequence of NA-2-1 cDNA insert 

60 The excised 877 bp DNA insert was sequenced using the chain termination method (F. Sanger etal. (1977) 
Proc. Natl. Acad. Scl. USA 74:5463-5467; Sanger eTaj. (1980) J. Mol. Biol. 143:161-178). The NA-2-1 clone 
Insert was found to contain the full §2 gene coding sequence but the sequence did not extend at the 5' end to 
an ATG codon. This clone Insert contained a nearly full length §2 gene cDNA. The full sequence of the NA-2-1 
clone Is not provided, this sequence was provided In U.S. Patent Application Serial Nos. 792,435 and 854,139. 

65 The sequence of the subsequently isolated full-length clone NA-2-2 (see below) Is provided In Table 3 and the 
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sequence differences in the 3'-region of the two clones are indicated therein. In the sequencing of the NA-2-1 
Insert, a stop codon was identified in the middle of what was believed to be the protein coding sequence. ^ 
Protein sequencing of the polypeptide fragment conresponding to the coding region In question revealed that 
an extra adenine nucleotide has been inserted in the region 171 - 182 of the clone, most iil^ely as a result of a 
sequencing artifact. ^ 

Example 9: Northern blot analysis 

A sizp^iabelled probe was prepared from the cDNA clone (NA-2-1) Insert encoding the S2-aliele associated 
protein by random priming. Aliquots of poly(A*) RNA were fractionated on formaldehyde -1 .20^ (w/v) agarose 
gels as described by Manlatis. et al . (1982) supra , except that the gel was run In 20 mM morphoiinopropane 10 
sulfonic acid (pH 7.0). 5 mM sodium acetate and 0.1 mM EDTA (pH 8.0) as a buffer. The gel was blotted directly 
onto nitrocellulose filters using 20X SSC. Klenow labelled-Hlndlll Eco RI lambda fragments were used as 
molecular weights markers. Prehybridization and hybridization were carried out at 42" as described for plaque 
hybridization. 

15 

Example 10: Cloning and sequencing of the nearly full length Sa-protein clone from NA-2-1 Into M13mp8 

The 877 bp NA-2-1 clone Insert was excised from gtIO with Eco RI restriction endonuciease. The DNA 
fragments generated were precipitated with ethanol. dried in vacuo and resuspended in water, to 0.25 ^ig 
DNA/jil. The DNA fragments (2.5 |ig) were then subjected to end repair by Incubation at 37' C for 1 hour In 25 
Hi buffer containing: 2 mM each of dATP, dCTP, dGTP and dTTP. 10 units DNA polymerase I (Klenow 20 
fragment). 50 mM Tris-HCI (pH 7.6), 10 mM MgCl2 and 10 mM dithiothreitol. The end-repaired fragments were 
reprecipltated. dried in vacuo and again suspended In water to 0.25 p^g DNA/p,l. 

The end repaired fragments were inserted into the commercially available vector M13mp8 which had been 
cut with Sma i restriction endonuciease and dephosphorylated (Amersham. Arlington Heights. Illinois). 
Blunt-end ligation was done using 1.25 pg of the end repaired fragments and 20 ng of M13mp8 in a buffer 25 
containing 1 U/jii T4 ligase. 1 mM ATP 66 mM Tris-HCI (pH 7.6). 5 mM MgCI2 and 5 mM dithiothreitol. The 
ligation mixture (total volume of 20 ^lI) was incubated overnight at 4*^0. 

The ligation mixture (10 |il) was then used to transform 0.2 ml of competent E. coli JM101 cells (Messing, J. 
et al. (1981) Nucleic Acids Res. 9:309). Clones containing the 877 bp Sa-protein DNA fragment were using the 
purified 877 bp S2-clone insert labeled with 32p by random priming as a hybridization probe. DNA was purified 30 
from one of the selected clones and a DNA molecule designated pAEC5 was Isolated which consisted of the 
877 bp fragment inserted in the Smal site of M13mp8. 

Mature styfe poly(A*) RNA was used to prepare a second cDNA library in gtlO. The library was constructed 
according to a method described by Oi<ayama etal. (1982) Mol. Cell Biol. 2:161-170. which was designed to 
optimize isolation of full-length cDNA clones. A library containing 20,000 plaques was obtained from 5 jxg of 35 
poly(A-l-) RNA. This library was screened as described in Example 6 using the 30-bp long synthetic 
oligonucleotide probe as well as the 877 bp cDNA insert from the NA-2-1 clone of Example 7. One clone, 
designated NA-2-2, which hybridized to both probes, was selected for further study. 

The NA-2-2 cDNA insert was sequenced using the same methods employed to sequence the NA-2-1 insert. 
Table 3 shows the sequence of the NA-2-2 cDNA insert which contains the full structural coding region for the 40 
mature Sa-protein which is identical to that of the NA-2-1 except that there was no extra adenine nucleotide in 
the NA-2-2 clone sequence. The NA-2-2 clone also encodes the full signal sequence, which extends 22 amino 
acids on the N- terminal end of the mature protein. The derived amino acid sequence of the signal peptide of 
both NA-2-1 and NA-2-2 is identical up to amino acid -18. The reason for the discrepancy in sequence at the 
6'-end between the two clones Is believed to be the result of a sequencing artifact. The two clones are different 45 
in the length of their 3' untranslated sequence. They are Identical to the polyadenylatlon site In clone NA-2-2. 
The NA-2-1 clone contains an extra 50 nucleotides before the poly(A) tail. 

Example 11: Isolation of N. alata Se and Se cDNA clones 

cDNA libraries of genotypes S3S3 and SeSe were prepared In gtIO using mRNA from mature styles as 50 
described In Example 4. Single stranded 32p-iabelled cDNA hybridization probes were prepared by random 
priming from the Individual RNA. Plaque hybridization screens were performed essentially as described In 
Example 4. 

The Ss-clones were selected by differential screening of the S3S3 cDNA library with S3S3 cDNA and SeSs 
labelled cDNA. Plaques that hybridized strongly to S3 §3 cDNA and weakly to SeSe cDNA were selected and 55 
rescreened with the labelled §2 cDNA clone (NA-2-1 or NA-2-2). Clones which hybridized to the S3S3 cDNA 
and the §2 cDNA clone were then used as probes of northern blots containing RNA from several N. alata 
S-genotypes. Clones which hybridized most strongly to RNA from styles which carry the Ss-allele, and weakly 
to RNA from styles which do not carry the S3-allele are selected as §3 clones. The DNA sequence of one §3 
clone selected by this procedure Is provided In Table 4. 60 

The §3 clone selected for sequencing In near full-length but during subcloning into the pGEM vector for 
sequencing, a short Eco RI fragment at the 5'-end of the clone was Inadvertently deleted. Sequence extending 
5' to the indicated Eco RI was determined by RNA sequencing and the N-terminal amino acid sequence was 
obtained by microsequencing analysis. 

Se cDNA clones were obtained using a similar differential screening procedure. Plaques were initially 65 
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selected if they hybridized strongly to SeSe cDNA and poorly to S3S3 cDNA. The DNA sequence of one Se 
clone selected by this procedure is provided in Table 5. This clone contained the entire Se gene coding 
sequence, but does not extend in the 5' direction to an ATQ codon and so is not full length. Furthermore, the 
sequenced Se clone does not contain a poly (A) tail. 

5 

Example 12: Isolation and characterization of the chromosomal S2 gene 

Genomic DNA of the N. alata S2S2 genotype was isolated from leaves essentially as described in Bernatzlcy 
and Tanksley, 1986, supra . The §2 cDNA clone was radioactlvely labelled and employed as a hybridization 
probe of Southern blots of Eco RI digested S2S2 DNA. The S2 gene probe hybridized to a single approximately 

10 3.1 kb Eco RI fragment. This fragment was Isolated and cloned In gtlOfollowIng ligation of Eco RI digested gtIO 
with size fractionated (2.5 kb - 4.0 kb) Eco RI. The 3.1 §2 gene fragment was sequenced and the sequence is 
given in Table 6. The fragment Includes an open reading frame extending from nucleotide 1603 to 2338 which Is 
Interrupted by a single 94 bp Intron (nucleotides 1833 - 1927). The sequence Includes the two polyadenylatlon 
signals (Ti and T2) which had been identified in the two §2 cDNA clones. Conventional primer extension 

IS techniques were employed to map the starting point of transcription to a "G" base 19 bp upstream of the ATG 
start codon. Sequence analysis identified a putative "TATA" box (nucleotides 1549 - 1559) in the 5' upstream 
region of the gene. 

Analysis of the 5' non-codlhg region of the S2 genomic clone 

20 Subclones of the 3.1 kb Eco RI §2 gene fragment were generated with Hindi. An approximately 1.0 kb 
subfragment extending 5' from nucleotide 1249 (Table 6) was used to probe Southern blots of total DNA from 
N. alata and L. esculentum digested with Hlndlll. As shown in Figure 6A. this probe produced a highly repeated 
pattern on N. alata DNA but hybridized to only one major band (approximately 750 bp) of L. esculentum DNAf 
l^itochondrial DNA was then Isolated from N. alata and L esculentum using the DNAse I procedure (Kalodner 

25 and Tweari (1972) Proc. Natl. Acad. Sci. USA 69:1830-1834). Southern blots of mitochondrial DNA were also 
probed with the approximately 1.0 kb nuclear DNA fragment (Figure 6A). A comparison clearly indicates that 
the 1.0 kb fragment contains a region that is homologous to mitochondrial DNA of both N. alata and L. 
esculentum . 

Mitochondrial DNA of N. alata was digested with Hindi 11 and ligated into the bacterial plasmid vector pQEM 

30 (Promega Biotec, Madison, Wisconsin) using T4 DNA iigase and transformed into E. coll JM109. The 750 bp 
homologous fragment was Identified by screening colony lifts with the approximately 1.0 kb Hindi fragment of 
the §2 gene. The mitochondrial DNA fragment was isolated and sequenced. The isolated 750 bp mitochondrial 
DNA fragment was then radioactively labelled and used as a probe of Southern blots of total and mitochondrial 
DNA of N. alata and L. esculentum (Figure 6B). The mitochondrial DNA fragment hybridized to a single 

35 fragment In total DNA of both N. alata and L. esculentum . The repeated pattern of hybridization to total DNA of 
N. alata in Figure 6B is apparently due to sequences in the 1 kb genomic clone outside of the mitochondrial 
DNA homologous segment. 

The 750 bp fragment was digested with Hindi, blotted and probed with the 1.0 kb genomic fragment to 
estimate the length of homology. The homologous sequence was found to occur on a 315 bp Hlndlll/Hlncll 

40 fragment which was cloned into pGEM and sequenced (Table 7). Alignment of the mitochondrial and 1 .0 kb Sz 
gene fragment sequences (Table 7} reveals a highly homologous 56 bp segment. Two additional short, 
perfectly matched sequences are also found 3' to the 56 bp segment. The spacing of the matched sequences 
is different in the mitochondrial and nuclear sequences. In addition the nuclear sequence contains a short 8 bp 
direct repeat that Immediately flanks the 5' region of homology. 

45 When Southern blots of total DNA of N. alata . L. esculentum and L pennellii probed with the 750 bp 
mitochondrial clone are subjected to long exposures to film (Figure 7A), several other fragments are found to 
hybridize to the probe. These fragments are believed to be nuclear DNA. Other evidence that the 750 probe 
hybridizes to nuclear DNA comes from an analysis of F2 progeny of a cross between L. esculentum and L 
pennellii . Samples of total DNA from six progeny were digested with EcoRI and probed with the 750 bp 

SO fragment (Figure 78). The differences observed In the hybridization patterns among the F2 progeny Is most 
likely due to segregation of nuclear fragments since the progeny have the same cytoplasm. 

In these experiments, Southem blots were produced from restriction fragments that were separated on 
0.90/0 agarose gels, treated for 12 minutes in 0.25 NHCl and transferred to Zetaprobe nylon membrane (Biorad, 
Richmond, California) in 0.4M NaOH. probes were made by random priming of Inserts. Filters were hybridized 

55 at 68° C overnight and were washed to a final stringency of 1 X SSC, O.IO/0 SDS at 68'* C. 

Those skilled in the art will appreciate that the invention described herein and the methods of Isolation and 
identification specifically described are susceptible to variations and modifications other than as specifically 
described, it is to be understood that the Invention includes ail such variations and modifications which fall 
within its spirit and scope. 
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Table 2: Partial nucleotide sequence of 100 bp cDNA fragment 



-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 

Phe lie Leu Leu Cys Ala Leu Ser pro lie 
TIC AFT TTG CTT TGT GCT CTT TC6 CCG ATT 

-2 -1 12345678 
Tyr Gly Ala Phe Glu Tyr Met Gin Leu Val 
TAT GGG GCT TTG GGG TAG ATG GAG CTC GT 



3'-GA.A ACA CGA GAA AGC GGG TAA ATA CCG CGA-5' 



30 mer probe sequence 
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Table 3* Nucleotide sequence of the full-length cDNA coding for the 32K molecular 
weight $2" protein of Nicotlana alata . 

Met Ser Lys Ser Gin Leu Thr Ser Val Phe Phe lie 
GACGGA ATG TCT AAA TCA CAG CTA ACG TCA GTT TTC TTC ATT 
-70 ^ ^5D 

Ltiu Leu Cys Ala Leu Ser Pro He Tyr Gly Ala Phe Glu Tyr Met Gin Leu Val Leu Thr 
TTG CTT TGT GOT CTT TCA CCG ATT TAT GGG OCT TTC GAG TAT ATG CAA CTC GT6 TTA ACA 
=^ZD 1 10 20 30 

Trp Pro lie Thr Phe Cys Arg He Lys His Cys Glu Arg Thr Pro Thr Asn Phe Thr He 
TGG CCA ATC ACT TTT TGC CGC ATT AAG CAT TGC GAA AGA ACA CCA ACA AAC TTT ACG ATC 
40 -jO 60 /O bO SO 

His Gly Leu Trp Pro Asp Asn His Thr Thr Met Leu Asn Tyr Cys Asp Arg Ser Lys Pro 
CAT GGG CTT TGG CCG GAT AAC CAC ACC ACA ATG CTA AAT TAC TGC GAT CGC TCC AAA CCC 
100 no 120 130 140 150 

Tyr Asn Met Phe Thr Asp Gly Lys Lys Lys Asn Asp Leu Asp Glu Arg Trp Pro Asp Leu 
TAT AAT ATG TTC ACG GAT GGA AAA AAA AAA AAT GAT CTG GAT GAA CGC TGG CCT GAC TTG 
160 170 180 190 200 210 

Thr Lys Thr Lys Phe Asp Ser Leu Asp Lys Gin Ala Phe Trp Lys Asp Glu Tyr Val Lys 
ACC AAA ACC A.AA TTT GAT ACT TTG GAC AAG CAA GCT TTC TGG AAA GAC GAA TAC 6TA AAG 
220 230 2C0 250 260 270 

His Gly Thr Cys Cys Ser Asp Lys Phe Asp Arg Glu Gin Tyr Phe Asp Leu Ala Met Thr 
CAT GGC ACG TGT TGT TCA GAC AAG TTT GAT CGA GAG CAA TAT TTT GAT TTA GCC ATG ACA 
ZSQ 290 300 310 320 330 

Leu Arg Asp Lys Phe Asp Leu Leu Ser Ser Leu Arg Asn His Gly He Ser Arg Gly Phe 
TTA AGA GAC AAG TTT GAT CTT TTG AGC TCT CTA AGA AAT CAC GGA ATT TCT CGT GGA TTT 
340 350 350 370 380 390 

Ser Tyr Thr Val Gin Asn Leu Asn Asn Thr lie Lys Ala He Thr Gly Gly Phe Pro Asn 
TCT TAT ACC GTT CA.A A.AT CTC AAT AAC ACG ATC AAG GCC ATT ACT GGA GGG TTT CCT Kk'i 
400 4i0 <20 h30 440 ^50 

Leu Thr Cvs Ser Arg Leu Arg Glu Leu Lys Glu He Gly He Cys Phe Asp Glu Thr Val 
CTC ACG TGC TCT AGA CTA AG6 GAG CTA fJ^G GAG ATA GGT ATA TGT TTC GAC GAG ACG GTG 
460 470 480 490 500 510 

Lys Asn Val He Asp Cys Pro Asn Pro Lys Thr Cys Lys Pro Thr Asn Lys Gly Val Met 
AAA AAT GTG ATC GAT TGT CCT AAT CCT ACG TGC AAA CCA ACA AAT AAG GGG GTT ATG 
520 530 540 550 560 . 570 

Phe Pro **• 

TTT CCA TGA TTAATAATArTfGTTTTATTGCATTAFGCCATGTAAAAAAAAATTCAAAACCTCAAGTATAAACGTG 
580 590 600 610 620 630 640 



TAATCAAGACTATTAA3CAC6CACTTATTGAAGACTAAAAAAAAAAAAAAAAAAA.AA 

ACTCGGAAGAATAAGCAAAA 

686 696 706 716 726 



I^A-S-^l : ACACTCGGAAGAATAAGCAAAATTCTTATCAATTTATGGAAATC 



GTTATTAAAAAAAAAAAAAAAAAAGGGGGACGGACTGG6AACGGTTCTTC6GGGTCCC6G 

• 736 746 756 766 775 786 



The signal sequence is underlined, positive numbering begins at the first 
codon of the mature protein sequence. The differences in 3' end sequence 
between the full-length NA-2«2 clone and the near fuH-length clone NA-2-1 
are also indicated. 
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Table 4: The nucleotide sequence of the S3 cDNA clone. ^ 



<--- AFEYMQLVLrQWPAA 
signal . . , TTA CAA TGG CCA GCA GCC 

147 

FC .HTTPSPCKRIPNN 
TTT TGT CAC ACC ACT CCT AGT CCT TGC AAA A GA ATT C CA AAC AAC 
174 Eco RI 

FTIHGIi WPDNVS TML 
TTC ACA ATT CAT GGG CTT TGG CCG GAT AAC GTG AGC ACA ATG CTT 
219 

i^YCSGEDEYEKLDDD 
AAT TAC TGC TGT GGC GAA GAT GAG TAC GAA AAA TTA GAT GAT GAT 
264 

KKKKDLDDRWPDLTI 
AAA AAG AAG AAA GAT CTG GAT GAC CGC TGG CCT GAC TTG ACA ATT 
309 

ARADCIEHQVFWKHE 
GCC CGA GCT GAT TGT ATC GAA CAT CAA GTT TTC TGG AAA CAT GAA 
354 

VNKHGTCCSKSYNLT 
TAC AAT AAG CAT GGA ACG TGT TGT TCC AAG AGC TAC AAT CTA ACA 
399 

QVFDL AKALKDKFDL 
CAA TAT TTT GAT TTA GCC ATG GCC TTA AAG GAC AAA TTT GAT CTT 
444 

LTS LR KH G I I PG N S-Y 
TTG ACA TCT CTC AGG AAG CAT GGC ATT ATT CCT GGA AAC AGT TAT 
489 

TVQKINSTIKAITQG 
ACC GTT CAA AAA ATC AAT AGC ACC ATA AAG GCA ATC ACG CAA GGG 
539 

YPNLSCTKRQMGLLE 
TAT CCT AAC CTC TCG TGC ACT AAA AGA CAA ATG GAG CTA TTG GAG 
579 

IGICFDSKVKNVIDC 
ATA GGC ATA TGT TTC GAC TCG AAG GTA AAA AAT GTG ATA GAT TGT 
624 

PHPKTCKPMGNRGIK 
CCT CAT CCT AAG ACA TGC AAA CCT ATG GGA AAT AGG GGG ATT AAG 
669 
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Table 4 (Continued) 



F P * 

TTT CCA TGA TTA TAA ATT TCT GTT TCT GTT GCT TTG AGC TGC CTA 
714 

AAA AAT AAT ACA AAA CTA ATA AGG GAT AAT CAG GAC CAT GGG ACA 
759 



ATT CTA TTA TGA AAG CCA ACA TTG TGG AAC CAT ATA TAA TTT CCA 
804 



TAT AAA TTT ATG AAA — T ATT ATT GAA CTG ACA CTT ATT TTG TGT 
849 

CAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA AAA 
899 

AAA AAA AAA AA 
939 



The isolated S3 cDNA clone is near full length, but part 
of the 3 ' end of the clone was removed during subcloning 
for sequencing due to the presence of an EcoRI site (196 
- 201). The sequence 5* to this site was obtained by 
RNA sequencing • The N-tenr»inal amino acid sequence was 
obtained by microsequencing analysis of the isolated S3 
protein. 
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Table 5: Nucleotide sequence of the Sg-cDNA clone^ 

MFNIiPLTSVFVIF-FALSPlY 
ATGTTTAACTTACCACTCACGTCAGTTTTCGTCATATTT-TTTTTGCTCTTTCGCCCATTTAT 
1 10 20 30 40 50 60 

GAFEYMQLVLQWPTAFCHTT 
GGGGCTTTCGAATACATGCAACTTCTTTTACAATGGCCAACCGCTTTTTGCCACACTACT 
signal 70 80 90 100 HO 120 

PCKNIPSNFTIHGLWPDNVS 
CCTTGCAAAAATATTGCAAGCAACTTTACAATCCATGGACTTTGGCCGGATAACGTGAGT 
130 140 150 160 170 ISO 

TTLNFCGKEDDYNIIMDGPE 
ACAACGCTG7.ATTTCTGTGGTAAAGAAGATGACTATAACATTATAATGGATGGACCCGAG 
190 200 210 220 230 240 

KNGLYVRWPDLIREKADCMK 
AAGAATGGTCTGTATGTCCGCTGGCCTGACTTGATCAGAGAGAAAGCTGATTCTATGAAJii 
250 260 270 280 290 300 

TQNFWRREYIKHGT CCSEIY 
ACGCAA-'UVTTTCTGGA.GACGTGAATACATTAAGCATGGAACGTGTTGTTCAGAGATCTAC 
310 320 330 340 350 360 

KQVQYFRLAMALKDKFDLLT 
AATC7iAGTACAATATTTTCGTTTAGCCATGGCCTTAAAAGACA.AGTTTGATCTTCTGACT 
370 380 390 400 410 420 

SLKNHGIIRGYKYTVQKir>lK 
TCTTTGAAAAATCATGGAATTATTCGTGGTTACAAATATACCGTTCAGAAAATCAATAAC 
430 440 450 460 470 480 

TIKTVIKGYPNLSCTKGQEL 
ACGATCAAGACAGTAACAAAAGGGTATCCTAACCTCTCGTGCACTAAAGGGCAAGAACTA 
490 500 510 520 530 540 

WEVGICFDSTAKNVIDCPNP 
TGGGAGGTTGGCATATGTTTCGACTCGACAGCGAAAAATGTAATTGATTGTCCTAATCCT 
550 560 570 580 590 600 
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Table 5 (cont.) 



KTCKTASNQGIMFP* 
AAGACATGCAAAACAGCGTCGAATCAGGGAATTATGTTTCCATGAACAAAATTGGCATTT 
610 620 630 640 650 660 



TTCTTGGTTTAGGCTACGTAAACCAAAATCCAAACCACACGAATAATCAAGAAAATCAAA 
670 680 690 700 710 720 



CAAAATTTTATTATGAAGATCAAATTGTCAAACCATATGTAAATTTGATAACAAATTTAT 
730 740 750 760 770 780 



GAAAAGTATTATTGAACTGCG 
790 800 



^ The Sg cDNA clone does not extend to an ATG codon at ."che 5' 
end and does not contain a poly (A) tail. It is believed that the 
clone is only 2 bases short at the 5» end with the first 
nucleotide of the sequence predicted to be the last base cf the 
ATG start codon. The predicted bases at the 5' end of the 
sequence are underlined. 
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Table 6 : N. alata c-snomic sequence 



GAATTCACeAGAAGAAGTGTC 
10 20 



T&TTTCTTATCATTrCTC-CTAASAAAXTCAGfV^ACTATTTSTAr-C.Cr.G 

'tO s:. co 70 tjO 



TCe-G AAGAC TT C- ATT TT T T S C A T C;i 

VO 100 liO 



TTTC.r:CCCSARACCCCC>A'iTTG3C?AG3CC:-:^Tv>:.V:r.'.;:.GC 

i«o :30 i-.o :so i^:o 



ACGA5C:C1 

i70 



CGACGA.'-l-T L73rAC7TAf:.AS.iAAi;crv.&T- t; CnSAfrC^TCx^i-C.^TrriAASA -CAAGTC'i C r-C.-...?.r 

is:* 19:* 200 • 21*:: 7.z:» Z3v 



ACA7G'JCTACACCTAACA3AT:i33ACTATCTAGGn;C{:TAC1 ATwATSTC-CCSCA^^ 

230 260 270 230 2«7;» ZOO 310 C20 



TTAACT'rACTTGCATTTTAC"A3TCCC2rT«CCTATATAAA3?5?AClv^GC:CCTAC 

Z-;0 350 3^30 370 r'.JO 3r0 lO'i 



ca:: 



rCCAAAATGCAATAA-ATZTTTCTCTCTTTCTCT" 
^10 420 430 440 



rAC.r./.:-rTTACTATTTC2TT3:T- 
-90 500 



3TTTTTCCmTA:TTTC.CTCAA-AVTGATAB-: ATAl^v:rTA3.SCT2AA7Ce -A*-T1 
510 S20 S'So 54 0 S50 S'.-O 



TA2CA3'TATC3CCTTJ:r33A33ATCCTCGATAAGCCCGAGACA3S2TCG, 
S70 5SO 590 600 610 



620 



'TC-3CC&G 
- - 64 O 



G1" 



6SO 



TGT3"1CT1 ACT3C<TTC*GA7TATCGa7TT{^TTTAAC7CGATCV33ATCGCTTTAClTCArr;C:T7 
660 fe/O 630 to90 700 7:0 72C 



730 



rAeCTCBG:3AA7A»A7CAC GT A7TTTTASAA7 AC.CAT77ATAAA7T7AATTeT rGT7 
740 750 • 760 770 7CO 790 8''»0 



AC7A77fTCACGGTAAACA3C733AAGAATL:GTGAAAATACC7ATATGA&G7TGTT7ACCAAGAATGTTGGTCA-3ATrA 
BIO 820 SoO G40 S30 360 G70 SQO 
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Table 6 (Continued) 



AC:CCCAACAAC77CAAA6CTT.V*AAATTAATTTTTTCTTroCTAAA/fCAC;ATTTAMCATTTC7S6AAA 

a?0 900 910 720 <9w*0 940 950 f&O 



ACACACAAAACAT AAAATCACCAAAlCAAGTTCCTCGATe'i T'i C AAATCATGAAATA6AAASCTAtV-.CTTCAAAAAAATA 
V70 920 990 .1000 10 lO 1020 1030 lO-n:. 



r ATCf'AKTCAC TAAG7 ACT 7':'7Cr AA7':AA7TAGCA7 AAC ACAr.ACT TC A7A7r ACAAAA/'.37 .-.d CTAT AAAA A67A737 
1050 1060 1C70 lOSO 1090 1 lOO JllO : 5 H'! 



|iCCAACAA3:j:rAGCfC^ CAT .=.0 -a:: 

1130 1140 1150 1160 1170 IISO 1190 • lOO 

ToA7I5SAA7AAAT ATATC-AGTCTT7AAG&AGCAAeCCA7AGG7T6M677GACAG"AAA6AAe7C^^ 

1210 1220 1230 1240 12S0 1260 1270 i -I rC- 



AGAGAAA&7GG7T STA^-V^A^'i AoCTCAC AAAAA77 r3C7CToATA7CACG73AA7BAA7A75A5CA7A7AACT AAAr -3% T 
129P 1300 1310 1320 1330 1340 1350 :3-*0 



7AAACCCAT CGGAG^y-.TASCr 1CAAAAAAAAAAA77CCACCCA7T7GA1 AA77::T7ACACCACTAACGAG7GA2;-C^ 
1370 1330 1390 1-^00 l«;iO 1420 K30 



A7T i:.7AC777 A7C.A77AA3 a:-.A37AA7 7AS57A7SA37C7AAT AG7 A3A7AC77A7C7AeACCAAA5AAAACGT :»1 C 3i* A 
1450 14c:0 1470 14S0 1490 ISOO ISIO iSZO 

TATA s+cxn-. 
777QA3AC77 A- CGACSPA7 ,V AAAAlI TfigMrA"|^j[A5rC77GlA7GA7ASGA AACA:-AAAl&*.e7 3T 37CCA~C"ACG 
i:;30 1540 1550 1560 1570 15S0 1590 



K&t 5ci- Lvii Ss*r Gi;: L r-u 7hr 5cr Vcxl '"l.e pfic lie Leu Lcvu Cys .-la Leu Slt rrc- 
GA ATG 7CT- AAA 7CA C^C C7A ACG 7CA GT7 77C TTC AT7 77G CV7 *rGT GC^ C77 7CA CCC- 
1610 :f^20 y^ZO 1640 1650 

lie Tyr Giy Alii Pho Glu 7yr Met Gin Leu Val Leu 7hr 7rp Pro He 7hr Phe Cys Arc 
AIT TAT 6GG GCT TTC GAS TAT ATG CAA CTC 6TG TTA ACA TGS CCA A7C ACT TTT TGC C3C 
1669 1679 16G9 1699 1709 1719 
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Table 6 (Contiiuied) 



He? Lys Hi& Cys Glti Arg Thr Pro Thr Asn The Thr* He- His Giy Leu Trp Pro Acs At%r. 

ATT AA:3 CAT TGC GAA AGA ACA CCA ACA AA:: TTT ACG ATC CAT GG(5 CTT TGG CCG GAT AAC 
17:^V 3739 )/*.9 1757 :779 

Hl!i T:i.- Thr r-;Fjt L<5U Aisn Tvr Cvr» A«-. Ary Ser Lys Pro Tyr A^in hat Phs Thr 

CAC /-CC ACA ATG CTA AAT TAC ISC CAT CGC VCC AAA CCC TAT AAT ATS TTC ACG GT/./ATT 
17S9 1799 IO09 IS 19 1829 1339 



TCTTAOTTATTTTCaGGAG::ACCl TCA/^ATTTTC AT TTC AT TTTTTCCTTTTC ATTATT^ Z^T ATtV-aVrTTT"- C -J VA'«"6 
•SSO ie60 iC'70 ::-5;V5 1390 I7'::i l^TlO 192.0 

Ar-;p Gly Lys Lys Lyc; Akh Af-p ! .sr;» »'»«ir' Glci Arg Try fri- Ate Leu Thr Lv5V T'n 
CCAACAG GAT GC?A AAA AAA AAA AAT GAT CT-3 GAT GAA CGC TGG CCT GAC TTG ACC AAA f CC 

1730 19^0 :'<^7jj z^on :<>70 ivc-:* 

Ly^s Phcf Asp 3'ir Leu Asp Ly?» Gil-. Alci K"ho I'r-p Lys Aso Glu Tvr V,nl Ly*3 Hio Cl> Tlir 
AAA TTT GAT AG' TTG GAC AAG CAA CiC TTC *( GG AAA GAC GAA TAC t-TA AAG CAT CCC AC 

1991 :::ooi ^c^;.: .2021 • 203 : 20 « a 

Cy*£ Cys Ser Aiso Lys F'he A&p Afg G!u Gao Tvr Phc Ae,p Lc-u aIa ret Tlir L<?v. A- o A»-p 
TGT TGT TCA GAC AAG TTT GAT CGA GAG CAA TAT TTT GAT TTA GCC AT6 ACA TTA A3A C-AC 
2051 2Gc:l 2071 20SI 2091 2::Cl 

Lys Ph« AS.-5 Leu Leu Ssr Ser Leu Arg Asn His Gly He Ser Arg Gly Pho Scr Tyr Tnr 
AAG TTT GAT CTT TTG AGC TCT CTA AGA AAT CAC GGA.ATT TCT CGT &GA TTT TCT T»-.T ACC 
2111 2121 2:ri 2141 2151 ll^-i 

Vc«i Glr. Atin Leu As.n A«:n Thr Jic- Lys. Ala lie Thr Gay Gly F'hc Fro Asn Leu Thr C> S 
GTT CAA AAT CTC AAT AAC AQG ATC AAG GCC ATT ACT GGA GGG TTT CCT AAT CTC ACG TGC 
2171 21B1 2:93 2201 2211 2221 

Sifr Arg Lml* Ar^ Glu L»2n Lyi, GUi 1 1 Gly He Cys Aip Glu "hr V^il Lvs As^n. V*,l 

TCT a:?A CTA AG3 GAG CTA AAG GAG ATA GGT ATA TGT TTC GAC Ga& aCP GTG AAA AAT C . u 
227.1 22^2 2251 2261 227; 22 -il 

H.:r ''-L P Cys Pre Acn Pi c* L yi: 7^r Cyi-;. Lys ri o 1 hr Ai:n Lyt G3v Veil .^xct "I'::' f r -r 

ATC GAT TGT CCT AAT CCT AAA ACG TGC AAA CCA ACA AAT AAG GGG oTT ATG TTT CCA TCA 

2j:-9i 2roi 222 1 222: 23::- 1 :r»A2 

TTAATAATATTTGTTTTATT62ATTATGCCATGTAAAAAAAAATTCAAAACCTCAAGTATAAACGTGT?SStCA^jACTA 
23&1 2362 237: 23S1 2391 2401 2411 
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Table 6 (Cbntinued) 



TTAASrArGCACrTATTeAAr,ArTACACTCe5AAG^^'^§'CAr.AATrCTTATCA^ 

"^'^430 :21mO 24S0 2>»70 r.M</C' 



AC6-~A7TCTC6TCCGTCAAATArGACATACCTTGTCAATTTTCTTCTTTATTGCCCAACATC5TATCA7«;.:.TGATTGrTT 

2!3io :;s.-20 Z'S30 2540 'J5r-;o s::6C» sn-^o :>».rjo 



ACC-J'TAAAA^TGGTAA-^CACAATTAQATTTGACTrTnTGeTTTTAAAAATACGTAATTTTTT-TATuC/fAarTCT 

::i590 , r.60C> r:6io i*6i:o 26oO 26^0 :'tr.o -oi*:- 

AATAGATGGTAAGTGTAAATCAG3AAAATGASA7GAGAGCTTGAGrjATAeTATGTTATeCAA/.CCfiAe7;rOCA- 

2670 :*6£0 :?690 27 OO 27 SO 2720 :V/r.C T.?^:- 



AAAT7GAAATTA7T6TGGCGGCG37ACC-GAGCAATATATATCAATAATACCGCGGCGTTO:AGACAArAr-AGGr:-TAVT 

2750 r760 2770 S7GO r;:7«;o 2s*x> rv-» ::s:ro 



7ATG7TAAAACGSCA77777ATAAA7TATG3CGGT7CAAACeGCCAC7AG77ACACAAo7777AAA7AA77ATTVGCCC7 
2S30 2340 2830 2S60 2&70 2DS0 2CtO 2700 



C777 AT77GGAACTCCCCCA27AA7 AA7T7 AA7AC7A77AAAAACA7ATAAAA7A7AC7AAGCC77C7C TAAGCC : A AAA 
2910 2TOO 2930 29^50 IS'SO 2960 2970 29BC 

CA7A7G7AAAC76ACGGTCTTCCC7C7CTC7ATA3G3CATGTC7ACACCCCCTC7A7C7C7CTCTCTCAAAAACACGATT 
2990 3000 3010 3020 3030 30^0 3050 ' 3w£..0 

CCCCCAAA77GT77AG{:AT77A737AAG3-3ArCAGA77CCAA':7CG777A7GG7AA7GT5TT7aAA-^ 
3070 30S0 30*^0 3100 31 lO 312«.' 
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Comparison of the homologous mitochondrial 
(Mt) sequence v;ith that of the upstream 
sequence of the S2 gene (Nuc).^ 



15 



> > 

Nuc ACAAAAAGTACCTATAAAAAGTATGTCCCAACAATTTAGCCTGAAATGAAAAAAAG 

Mt AGCTTGAJ^TCCCTATAAAAAGTCCGTCCCAACAATTTAGCCTGAAAAGAAAAAAAG 
10 20 30 40 50 

Nuc TGGGGTAGAAACT AAGTTTCT TTT AGATCCTTTTGAAA TCCTr::,TArAArTr:3^T-r:r: 

Mt TGGGGTAG AAGTTTCTA TTGAATTGAGTA AGATCCTTTTGAA TAGAAGATnrrATn 
60 70 80 90 100 110 



1. The 3.2 gene sequence presented in this Table 
corresponds to the sequence of Table 6, nucleotides 
1095 - 1206, The sequences are aligned for best 

30 overlap and homologous bases are indicated by The 

56bp homologous segment extends from bases 11 to 66. 
The two additional regions of sequence identity are 
underlned. The position of an 8bp direct repeat is 

^ indicated by arrows. 



TABLE 8 

^ AMINO ACID ABBREVIATIONS 

A = Ala = Alanine M = Met = Methionine 

C = Cys = Cysteine N = Asn = Asparaglne 

D = Asp = Aspartic Acid P Pro = Proline 

E = Glu ==> Glutamic Acid Q = Gin « Glutamlne 

F = Phe = Phenylalanine R = Arg = Arginine 

G = Gly « Glycine S = Ser « Serine 

H = His =» Histidine T = Thr Threonine 

i = lie Isoleuclne V » Val » Valine 

50 K = Lys = Lysine W = Try = Tryptophan 

L « Leu = Leucine Y = Tyr = Tyrosine 
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Claims 

1 . A method for isolating and Identifying a cDNA clone of an S-gene of a gametophytic self-incompatible 
plant comprising the steps of: 
60 a) preparing a cDNA clone library from mature styles of a known S-genotype of said gametophytic 

self-incompatible plant and wherein said plant of l<nown S~genotype expresses said S-gene; 

b) differentially screening said cDNA clone library with a first hybridization probe comprising cDNA 
prepared from mature style RNA of an S-genotype of said gametophytic self-incompatible plant 
which plant expresses said S-gene and a second hybridization probe comprising cDNA prepared 
65 from mature style RNA of an S-genotype of said gametophytic self-Incompatible plant that is different 
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from the S-genotype of the plant employed In the preparation of said cDNA library, and which 
S-genotype does not express said S-gene; 

" c) selecting clones from said cDNA library which hybridize more strongly to said first hybridization 
probe than to said second hybridization probe; 

d) rescreening said clones selected In step c) for hybridization to at least two style RNA 5 
preparations from different S-genotypes of said self-Incompatible plant wherein at least one of said 
preparations Is from an S-gehotype which expresses said S-gene and at least one of said style RNA 
preparations Is from an S-genotype which does not express said S-gene; and 

e) selecting and Isolating those clones which hybridize more strongly to style RNA preparations 
from S-genotypes which express said S-gene than to style RNA preparations from S-genotypes 10 
which do not express said S-gene, thereby identifying and Isolating a cDNA clone of said S-gene of 

said gametophytic self-incompatible plant. 

2. The method of claim 1 wherein said gametophytic self-Incompatible plant is of the genus Nicotiana . 

3. The method of claim 2 wherein said gametophytic self-incompatible plant Is Nicotiana alata . 

4. The method of claim 1 wherein the S-genotypes employed to prepare said cDNA library and said first IS 
and second cDNA hybridization probes are homozygous S-genotypes. 
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FIGURE 3 



TISSUES OF N.ALATA SpSg 
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P = petal 
L = leaf 
0 = ovary 
A = anther 
S = style 
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FIGURE 4 
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FIGURE 5 
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